Skip to main content
Mastra 1.0 is available 🎉 Read announcement

Observational Memory

Added in: @mastra/memory@1.1.0

Observational Memory (OM) is Mastra's memory system for long-context agentic memory. Two background agents — an Observer that watches conversations and creates observations, and a Reflector that restructures observations by combining related items, reflecting on overarching patterns, and condensing where possible — maintain an observation log that replaces raw message history as it grows.

Usage
Direct link to Usage

src/mastra/agents/agent.ts
import { Memory } from "@mastra/memory";
import { Agent } from "@mastra/core/agent";

export const agent = new Agent({
name: "my-agent",
instructions: "You are a helpful assistant.",
model: "openai/gpt-5-mini",
memory: new Memory({
options: {
observationalMemory: true,
},
}),
});

Configuration
Direct link to Configuration

The observationalMemory option accepts true, false, or a configuration object.

Setting observationalMemory: true enables it with all defaults. Setting observationalMemory: false or omitting it disables it.

enabled?:

boolean
= true
Enable or disable Observational Memory. When omitted from a config object, defaults to `true`. Only `enabled: false` explicitly disables it.

model?:

string | LanguageModel | DynamicModel | ModelWithRetries[]
= 'google/gemini-2.5-flash'
Model for both the Observer and Reflector agents. Sets the model for both at once. Cannot be used together with `observation.model` or `reflection.model` — an error will be thrown if both are set.

scope?:

'resource' | 'thread'
= 'thread'
Memory scope for observations. `'thread'` keeps observations per-thread. `'resource'` shares observations across all threads for a resource, enabling cross-conversation memory.

shareTokenBudget?:

boolean
= false
Share the token budget between messages and observations. When enabled, the total budget is `observation.messageTokens + reflection.observationTokens`. Messages can use more space when observations are small, and vice versa. This maximizes context usage through flexible allocation.

observation?:

ObservationalMemoryObservationConfig
Configuration for the observation step. Controls when the Observer agent runs and how it behaves.

reflection?:

ObservationalMemoryReflectionConfig
Configuration for the reflection step. Controls when the Reflector agent runs and how it behaves.

Observation config
Direct link to Observation config

model?:

string | LanguageModel | DynamicModel | ModelWithRetries[]
= 'google/gemini-2.5-flash'
Model for the Observer agent. Cannot be set if a top-level `model` is also provided.

messageTokens?:

number
= 30000
Token count of unobserved messages that triggers observation. When unobserved message tokens exceed this threshold, the Observer agent is called.

maxTokensPerBatch?:

number
= 10000
Maximum tokens per batch when observing multiple threads in resource scope. Threads are chunked into batches of this size and processed in parallel. Lower values mean more parallelism but more API calls.

modelSettings?:

ObservationalMemoryModelSettings
= { temperature: 0.3, maxOutputTokens: 100_000 }
Model settings for the Observer agent.

Reflection config
Direct link to Reflection config

model?:

string | LanguageModel | DynamicModel | ModelWithRetries[]
= 'google/gemini-2.5-flash'
Model for the Reflector agent. Cannot be set if a top-level `model` is also provided.

observationTokens?:

number
= 40000
Token count of observations that triggers reflection. When observation tokens exceed this threshold, the Reflector agent is called to condense them.

modelSettings?:

ObservationalMemoryModelSettings
= { temperature: 0, maxOutputTokens: 100_000 }
Model settings for the Reflector agent.

Model settings
Direct link to Model settings

temperature?:

number
= 0.3
Temperature for generation. Lower values produce more consistent output.

maxOutputTokens?:

number
= 100000
Maximum output tokens. Set high to prevent truncation of observations.

Examples
Direct link to Examples

Resource scope with custom thresholds
Direct link to Resource scope with custom thresholds

src/mastra/agents/agent.ts
import { Memory } from "@mastra/memory";
import { Agent } from "@mastra/core/agent";

export const agent = new Agent({
name: "my-agent",
instructions: "You are a helpful assistant.",
model: "openai/gpt-5-mini",
memory: new Memory({
options: {
observationalMemory: {
scope: "resource",
observation: {
messageTokens: 20_000,
},
reflection: {
observationTokens: 60_000,
},
},
},
}),
});

Shared token budget
Direct link to Shared token budget

src/mastra/agents/agent.ts
import { Memory } from "@mastra/memory";
import { Agent } from "@mastra/core/agent";

export const agent = new Agent({
name: "my-agent",
instructions: "You are a helpful assistant.",
model: "openai/gpt-5-mini",
memory: new Memory({
options: {
observationalMemory: {
shareTokenBudget: true,
observation: {
messageTokens: 20_000,
},
reflection: {
observationTokens: 80_000,
},
},
},
}),
});

When shareTokenBudget is enabled, the total budget is observation.messageTokens + reflection.observationTokens (100k in this example). If observations only use 30k tokens, messages can expand to use up to 70k. If messages are short, observations have more room before triggering reflection.

Custom model
Direct link to Custom model

src/mastra/agents/agent.ts
import { Memory } from "@mastra/memory";
import { Agent } from "@mastra/core/agent";

export const agent = new Agent({
name: "my-agent",
instructions: "You are a helpful assistant.",
model: "openai/gpt-5-mini",
memory: new Memory({
options: {
observationalMemory: {
model: "openai/gpt-4o-mini",
},
},
}),
});

Different models per agent
Direct link to Different models per agent

src/mastra/agents/agent.ts
import { Memory } from "@mastra/memory";
import { Agent } from "@mastra/core/agent";

export const agent = new Agent({
name: "my-agent",
instructions: "You are a helpful assistant.",
model: "openai/gpt-5-mini",
memory: new Memory({
options: {
observationalMemory: {
observation: {
model: "google/gemini-2.5-flash",
},
reflection: {
model: "openai/gpt-4o-mini",
},
},
},
}),
});

Standalone usage
Direct link to Standalone usage

Most users should use the Memory class above. Using ObservationalMemory directly is mainly useful for benchmarking, experimentation, or when you need to control processor ordering with other processors (like guardrails).

src/mastra/agents/agent.ts
import { ObservationalMemory } from "@mastra/memory/processors";
import { Agent } from "@mastra/core/agent";
import { LibSQLStore } from "@mastra/libsql";

const storage = new LibSQLStore({
id: "my-storage",
url: "file:./memory.db",
});

const om = new ObservationalMemory({
storage: storage.stores.memory,
model: "google/gemini-2.5-flash",
scope: "resource",
observation: {
messageTokens: 20_000,
},
reflection: {
observationTokens: 60_000,
},
});

export const agent = new Agent({
name: "my-agent",
instructions: "You are a helpful assistant.",
model: "openai/gpt-5-mini",
inputProcessors: [om],
outputProcessors: [om],
});

Standalone config
Direct link to Standalone config

The standalone ObservationalMemory class accepts all the same options as the observationalMemory config object above, plus the following:

storage:

MemoryStorage
Storage adapter for persisting observations. Must be a MemoryStorage instance (from `MastraStorage.stores.memory`).

onDebugEvent?:

(event: ObservationDebugEvent) => void
Debug callback for observation events. Called whenever observation-related events occur. Useful for debugging and understanding the observation flow.

obscureThreadIds?:

boolean
= false
When enabled, thread IDs are hashed before being included in observation context. This prevents the LLM from recognizing patterns in thread identifiers. Automatically enabled when using resource scope through the Memory class.