Observational Memory

Added in: @mastra/memory@1.1.0

Observational Memory (OM) is Mastra's memory system for long-context agentic memory. Two background agents — an Observer that watches conversations and creates observations, and a Reflector that restructures observations by combining related items, reflecting on overarching patterns, and condensing where possible — maintain an observation log that replaces raw message history as it grows.

Usage
Direct link to Usage

src/mastra/agents/agent.ts
import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  id: 'my-agent',
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5-mini',
  memory: new Memory({
    options: {
      observationalMemory: true,
    },
  }),
})

Configuration
Direct link to Configuration

The observationalMemory option accepts true, a configuration object, or false. Setting true enables OM with google/gemini-2.5-flash as the default model. When passing a config object, a model must be explicitly set — either at the top level, or on observation.model and/or reflection.model.

Observer input is multimodal-aware. OM keeps text placeholders like [Image #1: screenshot.png] in the transcript it builds for the Observer, and also sends the underlying image parts when possible. This applies to both single-thread observation and batched multi-thread observation. Non-image files appear as placeholders only.

OM performs thresholding with fast local token estimation. Text uses tokenx, and image-like inputs use provider-aware heuristics plus deterministic fallbacks when metadata is incomplete.

enabled?:

boolean

= true

Enable or disable Observational Memory. When omitted from a config object, defaults to true. Only enabled: false explicitly disables it.

model?:

string | LanguageModel | DynamicModel | ModelByInputTokens | ModelWithRetries[]

= 'google/gemini-2.5-flash' (when using observationalMemory: true)

Model for both the Observer and Reflector agents. Sets the model for both at once. Cannot be used together with observation.model or reflection.model — an error will be thrown if both are set. When using observationalMemory: true, defaults to google/gemini-2.5-flash. When passing a config object, this or observation.model/reflection.model must be set. Use "default" to explicitly use the default model (google/gemini-2.5-flash).

scope?:

'resource' | 'thread'

= 'thread'

Memory scope for observations. 'thread' keeps observations per-thread. 'resource' (experimental) shares observations across all threads for a resource, enabling cross-conversation memory.

activateAfterIdle?:

number | string | false | "auto"

Time before buffered observations are forced to activate after inactivity, even before observation.messageTokens is reached. Accepts a numeric millisecond value such as 300_000, duration strings like "5m" or "1hr", "auto" for a provider-aware prompt cache TTL, or false to disable inherited observation idle activation. Reflections do not inherit this setting. Use reflection.activateAfterIdle to opt reflections into idle activation.

activateOnProviderChange?:

boolean

= false

Force buffered observations to activate when the actor provider or model changes. Reflections do not inherit this setting. Use reflection.activateOnProviderChange to opt reflections into provider-change activation.

shareTokenBudget?:

boolean

= false

Share the token budget between messages and observations. When enabled, the total budget is observation.messageTokens + reflection.observationTokens. Messages can use more space when observations are small, and vice versa. This maximizes context usage through flexible allocation. shareTokenBudget is not yet compatible with async buffering. You must set observation: { bufferTokens: false } when using this option (this is a temporary limitation).

temporalMarkers?:

boolean

= false

Insert temporal-gap reminder markers before new user messages when the previous message in the thread is at least 10 minutes older. The marker is persisted in memory, emitted as an inline reminder event so clients can render it specially, and shown to the observer so it can anchor observations to when events occurred.

retrieval?:

boolean | { vector?: boolean; scope?: 'thread' | 'resource' }

= false

Enable retrieval-mode observation groups as durable pointers to raw message history. true enables cross-thread browsing by default. { vector: true } also enables semantic search using Memory's vector store and embedder. { scope: 'thread' } restricts the recall tool to the current thread only. Default scope is 'resource'.

observation?:

ObservationalMemoryObservationConfig

Configuration for the observation step. Controls when the Observer agent runs and how it behaves.

ObservationalMemoryObservationConfig

model?:

string | LanguageModel | DynamicModel | ModelByInputTokens | ModelWithRetries[]

Model for the Observer agent. Cannot be set if a top-level model is also provided. If neither this nor the top-level model is set, falls back to reflection.model.

instruction?:

string

Custom instruction appended to the Observer's system prompt. Use this to customize what the Observer focuses on, such as domain-specific preferences or priorities.

threadTitle?:

boolean

When true, the Observer suggests short thread titles and updates the thread title when the conversation topic meaningfully changes. This is opt-in and defaults to disabled.

extract?:

Extractor[]

Custom values to extract after observation. Schema-less extractors are requested inline in the Observer output. Schema-backed extractors run as a follow-up structured output call and are stored in thread OM metadata.

observeAttachments?:

boolean | string[]

Controls which image/file attachments are forwarded to the Observer model alongside their placeholder text lines. true (default) forwards all attachments. false drops all attachments while keeping placeholders visible. An array is a case-insensitive mimeType allowlist supporting exact matches ('application/pdf'), wildcard subtypes ('image/*'), and bare '*' for everything. Useful when the Observer model is text-only (e.g. some DeepSeek endpoints) while the main agent uses a multimodal model. Tool-result attachments are filtered using the same rule.

messageTokens?:

number

Token count of unobserved messages that triggers observation. When unobserved message tokens exceed this threshold, the Observer agent is called. Text is estimated locally with tokenx. Image parts are included with model-aware heuristics when possible, with deterministic fallbacks when image metadata is incomplete. Image-like file parts are counted the same way when uploads are normalized as files.

maxTokensPerBatch?:

number

Maximum tokens per batch when observing multiple threads in resource scope. Threads are chunked into batches of this size and processed in parallel. Lower values mean more parallelism but more API calls.

modelSettings?:

ObservationalMemoryModelSettings

Model settings for the Observer agent.

ObservationalMemoryModelSettings

temperature?:

number

Temperature for generation. Lower values produce more consistent output.

maxOutputTokens?:

number

Maximum output tokens. Set high to prevent truncation of observations.

bufferTokens?:

number | false

Token interval for async background observation buffering. Can be an absolute token count (e.g. 5000) or a fraction of messageTokens (e.g. 0.25 = buffer every 25% of threshold). When set, observations run in the background at this interval, storing results in a buffer. When the main messageTokens threshold is reached, buffered observations activate instantly without a blocking LLM call. Must resolve to less than messageTokens. Set to false to explicitly disable all async buffering (both observation and reflection).

bufferOnIdle?:

boolean

Run background observation buffering when an agent turn ends and the agent becomes idle. This is separate from bufferTokens, which controls step-time async buffering. Set this to true to buffer short idle turns without waiting for the next turn or the messageTokens threshold.

bufferActivation?:

number

Controls how much of the message window to retain after activation. Accepts a ratio (0-1) or an absolute token count (≥ 1000). For example, 0.8 means: activate enough buffers to remove 80% of messageTokens and leave 20% as active message history. An absolute token count like 4000 targets a goal of keeping ~4k message tokens remaining after activation. Higher values remove more message history per activation when using a ratio. Higher values keep more message history when using a token count.

activateAfterIdle?:

number | string | false | "auto"

Time before buffered observations are forced to activate after inactivity. Accepts milliseconds, a duration string, "auto" for a provider-aware prompt cache TTL, or false. If unset, the top-level activateAfterIdle value is used for observations. Set false to disable the top-level idle setting for observations.

activateOnProviderChange?:

boolean

Force buffered observations to activate when the actor provider or model changes. If unset, the top-level activateOnProviderChange value is used for observations.

blockAfter?:

number

Token threshold above which synchronous (blocking) observation is forced. Between messageTokens and blockAfter, only async buffering/activation is used. Above blockAfter, a synchronous observation runs as a last resort, while buffered activation still preserves a minimum remaining context (min(1000, retention floor)). Accepts a multiplier (1 < value < 2, multiplied by messageTokens) or an absolute token count (≥ 2, must be greater than messageTokens). Only relevant when bufferTokens is set. Defaults to 1.2 when async buffering is enabled.

previousObserverTokens?:

number | false

Optional token budget for the observer's previous-observations context. When set to a number, the observations passed to the Observer agent are tail-truncated to fit within this budget while keeping the newest observations and preserving highlighted 🔴 items when possible. When a buffered reflection is pending, the already-reflected observation lines are automatically replaced with the reflection summary before truncation. Set to 0 to omit previous observations entirely, or false to disable truncation explicitly.

reflection?:

ObservationalMemoryReflectionConfig

Configuration for the reflection step. Controls when the Reflector agent runs and how it behaves.

ObservationalMemoryReflectionConfig

model?:

string | LanguageModel | DynamicModel | ModelByInputTokens | ModelWithRetries[]

Model for the Reflector agent. Cannot be set if a top-level model is also provided. If neither this nor the top-level model is set, falls back to observation.model.

instruction?:

string

Custom instruction appended to the Reflector's system prompt. Use this to customize how the Reflector consolidates observations, such as prioritizing certain types of information.

extract?:

Extractor[]

Custom values to extract after reflection. Schema-less extractors are requested inline in the Reflector output. Schema-backed extractors run as a follow-up structured output call and are stored in thread OM metadata.

observationTokens?:

number

Token count of observations that triggers reflection. When observation tokens exceed this threshold, the Reflector agent is called to condense them.

modelSettings?:

ObservationalMemoryModelSettings

Model settings for the Reflector agent.

ObservationalMemoryModelSettings

temperature?:

number

Temperature for generation. Lower values produce more consistent output.

maxOutputTokens?:

number

Maximum output tokens. Set high to prevent truncation of observations.

bufferActivation?:

number

Ratio (0-1) controlling when async reflection buffering starts. When observation tokens reach observationTokens * bufferActivation, reflection runs in the background. On activation at the full threshold, the buffered reflection replaces the observations it covers, preserving any new observations appended after that range.

activateAfterIdle?:

number | string | false | "auto"

Time before buffered reflections are forced to activate after inactivity. Accepts milliseconds, a duration string, "auto" for a provider-aware prompt cache TTL, or false. Reflections do not inherit top-level activateAfterIdle; set this explicitly to opt reflections into idle activation.

activateOnProviderChange?:

boolean

Force buffered reflections to activate when the actor provider or model changes. Reflections do not inherit top-level activateOnProviderChange; set this explicitly to opt reflections into provider-change activation.

blockAfter?:

number

Token threshold above which synchronous (blocking) reflection is forced. Between observationTokens and blockAfter, only async buffering/activation is used. Above blockAfter, a synchronous reflection runs as a last resort. Accepts a multiplier (1 < value < 2, multiplied by observationTokens) or an absolute token count (≥ 2, must be greater than observationTokens). Only relevant when bufferActivation is set. Defaults to 1.2 when async reflection is enabled.

Token estimate metadata cache
Direct link to Token estimate metadata cache

OM persists token payload estimates so repeated counting can reuse prior token estimation work.

Part-level cache: part.providerMetadata.mastra.
String-content fallback cache: message-level metadata when no parts exist.
Cache entries are ignored and recomputed if cache version/tokenizer source doesn't match.
Per-message and per-conversation overhead are always recomputed at runtime and aren't cached.
data-* and reasoning parts are skipped and don't receive cache entries.

Extractor API
Direct link to Extractor API

Extractor defines a value that OM should extract during observation or reflection. Built-in OM values such as current-task, suggested-response, and thread-title use the same extractor pipeline as custom values.

src/mastra/agents/agent.ts
import { Memory, Extractor } from '@mastra/memory'
import { z } from 'zod'

const memory = new Memory({
  options: {
    observationalMemory: {
      model: 'openai/gpt-5-mini',
      observation: {
        extract: [
          new Extractor({
            name: 'User profile',
            instructions: 'Extract stable user profile facts that should be remembered.',
            schema: z.object({
              name: z.string().optional(),
              timezone: z.string().optional(),
            }),
          }),
        ],
      },
    },
  },
})

name:

string

Human-readable extractor name. OM slugifies this value into the extractor slug. Names must be unique after slug generation.

slug:

string

Generated stable identifier for persisted values and XML tags. Slugs use lowercase letters, numbers, and hyphens. Built-in slugs and reserved XML tags cannot be used by custom extractors.

instructions:

string | (context) => string

Instructions for what to extract and when to update the value. Use a function to derive instructions from runtime context.

schema?:

ZodType<T> | (context) => ZodType<T> | undefined

Optional Zod schema for structured extraction. When provided, OM runs a follow-up structured output call after the main OM operation. When omitted, the extractor is an inline string extractor emitted in the Observer or Reflector response. Use a function to derive the schema from runtime context.

includePreviousExtraction?:

boolean

= true

Controls whether the previous extraction is shown to the extractor on future OM runs. Set to false for values that should only come from the current OM run.

onExtracted?:

(context) => T | void | Promise<T | void>

Optional hook called after a custom extractor returns a value and before metadata is persisted. Returning a value replaces the extracted value. Throwing records an extraction failure.

Extraction behavior
Direct link to Extraction behavior

Extracted values are stored in thread OM metadata under om.extracted.
Built-in extractor values are also mirrored to the compatibility metadata fields currentTask, suggestedResponse, and threadTitle.
thread-title updates the thread title only when observation.threadTitle is enabled.
observation.extract runs during observation. reflection.extract runs during reflection.
Schema-backed extractors add a follow-up structured output request.
Schema-less extractors are inline string extractors emitted directly in the Observer or Reflector output.
Dynamic extractor functions receive runtime context, including source, threadId, resourceId, mainAgent, memory, and requestContext when available.
WorkingMemoryExtractor uses the normal extractor pipeline to update working memory through the active Memory instance. It uses structured extraction when working memory has a JSON schema and skips OM metadata persistence, so the working memory payload isn't duplicated under OM extracted metadata.
observationalMemory.observation.manageWorkingMemory adds WorkingMemoryExtractor, defaults workingMemory.agentManaged to false, and defaults workingMemory.useStateSignals to true when working memory is enabled.
Extraction failures are reported in OM marker data and don't discard other successful extracted values.

Examples
Direct link to Examples

Working memory updates
Direct link to Working memory updates

Use observationalMemory.observation.manageWorkingMemory when OM should update working memory.

src/mastra/agents/agent.ts
import { Memory } from '@mastra/memory'

const memory = new Memory({
  options: {
    workingMemory: {
      enabled: true,
    },
    observationalMemory: {
      enabled: true,
      observation: {
        manageWorkingMemory: true,
      },
    },
  },
})

Set workingMemory.agentManaged: true if the main agent should still receive working memory tool and instruction injection.

Resource scope with custom thresholds (experimental)
Direct link to Resource scope with custom thresholds (experimental)

src/mastra/agents/agent.ts
import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  id: 'my-agent',
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5-mini',
  memory: new Memory({
    options: {
      observationalMemory: {
        model: 'google/gemini-2.5-flash',
        scope: 'resource',
        observation: {
          messageTokens: 20_000,
        },
        reflection: {
          observationTokens: 60_000,
        },
      },
    },
  }),
})

Shared token budget
Direct link to Shared token budget

When shareTokenBudget is enabled, the total budget is observation.messageTokens + reflection.observationTokens (100k in this example). If observations only use 30k tokens, messages can expand to use up to 70k. If messages are short, observations have more room before triggering reflection.

src/mastra/agents/agent.ts
import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  id: 'my-agent',
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5-mini',
  memory: new Memory({
    options: {
      observationalMemory: {
        shareTokenBudget: true,
        observation: {
          messageTokens: 20_000,
          bufferTokens: false, // required when using shareTokenBudget (temporary limitation)
        },
        reflection: {
          observationTokens: 80_000,
        },
      },
    },
  }),
})

Custom model
Direct link to Custom model

By passing a model in the config, you can use any model from Mastra's model router.

src/mastra/agents/agent.ts
import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  id: 'my-agent',
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5.5',
  memory: new Memory({
    options: {
      observationalMemory: {
        model: 'openai/gpt-5-mini',
      },
    },
  }),
})

Different models per agent
Direct link to Different models per agent

src/mastra/agents/agent.ts
import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  id: 'my-agent',
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5.5',
  memory: new Memory({
    options: {
      observationalMemory: {
        observation: {
          model: 'google/gemini-2.5-flash',
        },
        reflection: {
          model: 'openai/gpt-5-mini',
        },
      },
    },
  }),
})

Custom instructions
Direct link to Custom instructions

Customize what the Observer and Reflector focus on by providing custom instructions:

src/mastra/agents/agent.ts
import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  id: 'health-assistant',
  name: 'health-assistant',
  instructions: 'You are a health and wellness assistant.',
  model: 'openai/gpt-5.5',
  memory: new Memory({
    options: {
      observationalMemory: {
        model: 'google/gemini-2.5-flash',
        observation: {
          // Focus observations on health-related preferences and goals
          instruction:
            'Prioritize capturing user health goals, dietary restrictions, exercise preferences, and medical considerations. Avoid capturing general chit-chat.',
        },
        reflection: {
          // Guide reflection to consolidate health patterns
          instruction:
            'When consolidating, group related health information together. Preserve specific metrics, dates, and medical details.',
        },
      },
    },
  }),
})

Async buffering
Direct link to Async buffering

Async buffering is enabled by default. It pre-computes observations in the background as the conversation grows — when the messageTokens threshold is reached, buffered observations activate instantly with no blocking LLM call.

The lifecycle is: buffer → activate → remove messages → repeat. Background Observer calls run at bufferTokens intervals, each producing a chunk of observations. At threshold, chunks activate: observations move into the log, raw messages are removed from context. The blockAfter threshold forces a synchronous fallback if buffering can't keep up.

Default settings:

observation.bufferTokens: 0.2 — buffer every 20% of messageTokens (e.g. every ~6k tokens with a 30k threshold)
observation.bufferActivation: 0.8 — on activation, remove enough messages to keep only 20% of the threshold remaining
Buffered observations include continuation hints (suggestedResponse, currentTask) that survive activation to maintain conversational continuity
reflection.bufferActivation: 0.5 — start background reflection at 50% of observation threshold

To customize:

src/mastra/agents/agent.ts
import { Memory } from '@mastra/memory'
import { Agent } from '@mastra/core/agent'

export const agent = new Agent({
  id: 'my-agent',
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5-mini',
  memory: new Memory({
    options: {
      observationalMemory: {
        model: 'google/gemini-2.5-flash',
        observation: {
          messageTokens: 30_000,
          // Buffer every 5k tokens (runs in background)
          bufferTokens: 5_000,
          // Activate to retain 30% of threshold
          bufferActivation: 0.7,
          // Force synchronous observation at 1.5x threshold
          blockAfter: 1.5,
        },
        reflection: {
          observationTokens: 60_000,
          // Start background reflection at 50% of threshold
          bufferActivation: 0.5,
          // Force synchronous reflection at 1.2x threshold
          blockAfter: 1.2,
        },
      },
    },
  }),
})

To disable async buffering entirely:

observationalMemory: {
  model: "google/gemini-2.5-flash",
  observation: {
    bufferTokens: false,
  },
}

Setting bufferTokens: false disables both observation and reflection async buffering. Observations and reflections will run synchronously when their thresholds are reached.

note

Async buffering isn't supported with scope: 'resource' and is automatically disabled in resource scope.

Streaming data parts
Direct link to Streaming data parts

Observational Memory emits typed data parts during agent execution that clients can use for real-time UI feedback. These are streamed alongside the agent's response.

Read extractor results
Direct link to Read extractor results

Both completion events carry extractor output in their data payload. The extractor fields are:

interface DataOmObservationEndPart {
  type: 'data-om-observation-end'
  data: {
    /** Whether the completed work was an observation or reflection */
    operationType: 'observation' | 'reflection'
    /** Values extracted during this OM operation, keyed by extractor slug */
    extractedValues?: Record<string, unknown>
    /** Extractor failures from this OM operation. Successful extractor values are still included */
    extractionFailures?: Array<{ slug: string; error: string }>
    // ...other fields documented in the tables below
  }
}

Both extractor fields are optional. A completion can include values, failures, both, or neither. data-om-observation-end reports synchronous completion. data-om-buffering-end reports completed background work whose buffered content still awaits activation, although extractor metadata is already persisted. DataOmBufferingEndPart carries the same extractor fields, and both types are exported from @mastra/memory/processors. See Read extracted values from a stream for a consumer example.

`data-om-status`
Direct link to data-om-status

Emitted once per agent loop step, before model generation. Provides a snapshot of the current memory state, including token usage for both context windows and the state of any async buffered content.

interface DataOmStatusPart {
  type: 'data-om-status'
  data: {
    windows: {
      active: {
        /** Unobserved message tokens and the threshold that triggers observation */
        messages: { tokens: number; threshold: number }
        /** Observation tokens and the threshold that triggers reflection */
        observations: { tokens: number; threshold: number }
      }
      buffered: {
        observations: {
          /** Number of buffered chunks staged for activation */
          chunks: number
          /** Total message tokens across all buffered chunks */
          messageTokens: number
          /** Projected message tokens that would be removed if activation happened now (based on bufferActivation ratio and chunk boundaries) */
          projectedMessageRemoval: number
          /** Observation tokens that will be added on activation */
          observationTokens: number
          /** idle: no buffering in progress. running: background observer is working. complete: chunks are ready for activation. */
          status: 'idle' | 'running' | 'complete'
        }
        reflection: {
          /** Observation tokens that were fed into the reflector (pre-compression size) */
          inputObservationTokens: number
          /** Observation tokens the reflection will produce on activation (post-compression size) */
          observationTokens: number
          /** idle: no reflection buffered. running: background reflector is working. complete: reflection is ready for activation. */
          status: 'idle' | 'running' | 'complete'
        }
      }
    }
    recordId: string
    threadId: string
    stepNumber: number
    /** Increments each time the Reflector creates a new generation */
    generationCount: number
  }
}

buffered.reflection.inputObservationTokens is the size of the observations that were sent to the Reflector. buffered.reflection.observationTokens is the compressed result — the size of what will replace those observations when the reflection activates. A client can use these two values to show a compression ratio.

Clients can derive percentages and post-activation estimates from the raw values:

// Message window usage %
const msgPercent = status.windows.active.messages.tokens / status.windows.active.messages.threshold

// Observation window usage %
const obsPercent =
  status.windows.active.observations.tokens / status.windows.active.observations.threshold

// Projected message tokens after buffered observations activate
// Uses projectedMessageRemoval which accounts for bufferActivation ratio and chunk boundaries
const postActivation =
  status.windows.active.messages.tokens -
  status.windows.buffered.observations.projectedMessageRemoval

// Reflection compression ratio (when buffered reflection exists)
const { inputObservationTokens, observationTokens } = status.windows.buffered.reflection
if (inputObservationTokens > 0) {
  const compressionRatio = observationTokens / inputObservationTokens
}

`data-om-observation-start`
Direct link to data-om-observation-start

Emitted when the Observer or Reflector agent begins processing.

cycleId:

string

Unique ID for this cycle — shared between start/end/failed markers.

operationType:

'observation' | 'reflection'

Whether this is an observation or reflection operation.

startedAt:

string

ISO timestamp when processing started.

tokensToObserve:

number

Message tokens (input) being processed in this batch.

recordId:

string

The OM record ID.

threadId:

string

This thread's ID.

threadIds:

string[]

All thread IDs in this batch (for resource-scoped).

config:

ObservationMarkerConfig

Snapshot of messageTokens, observationTokens, and scope at observation time.

`data-om-observation-end`
Direct link to data-om-observation-end

Emitted when observation or reflection completes successfully.

cycleId:

string

Matches the corresponding start marker.

operationType:

'observation' | 'reflection'

Type of operation that completed.

completedAt:

string

ISO timestamp when processing completed.

durationMs:

number

Duration in milliseconds.

tokensObserved:

number

Message tokens (input) that were processed.

observationTokens:

number

Resulting observation tokens (output) after the Observer compressed them.

observations?:

string

The generated observations text.

currentTask?:

string

Current task extracted by the Observer.

suggestedResponse?:

string

Suggested response extracted by the Observer.

extractedValues?:

Record<string, unknown>

Values extracted during this OM operation, keyed by extractor slug.

extractionFailures?:

Array<{ slug: string; error: string }>

Extractor failures from this OM operation. Successful extractor values are still included.

recordId:

string

The OM record ID.

threadId:

string

This thread's ID.

`data-om-observation-failed`
Direct link to data-om-observation-failed

Emitted when observation or reflection fails. The system falls back to synchronous processing.

cycleId:

string

Matches the corresponding start marker.

operationType:

'observation' | 'reflection'

Type of operation that failed.

failedAt:

string

ISO timestamp when the failure occurred.

durationMs:

number

Duration until failure in milliseconds.

tokensAttempted:

number

Message tokens (input) that were attempted.

error:

string

Error message.

observations?:

string

Any partial content available for display.

recordId:

string

The OM record ID.

threadId:

string

This thread's ID.

`data-om-buffering-start`
Direct link to data-om-buffering-start

Emitted when async buffering begins in the background. Buffering pre-computes observations or reflections before the main threshold is reached.

cycleId:

string

Unique ID for this buffering cycle.

operationType:

'observation' | 'reflection'

Type of operation being buffered.

startedAt:

string

ISO timestamp when buffering started.

tokensToBuffer:

number

Message tokens (input) being buffered in this cycle.

recordId:

string

The OM record ID.

threadId:

string

This thread's ID.

threadIds:

string[]

All thread IDs being buffered (for resource-scoped).

config:

ObservationMarkerConfig

Snapshot of config at buffering time.

`data-om-buffering-end`
Direct link to data-om-buffering-end

Emitted when async buffering completes. The content is stored but not yet activated in the main context.

cycleId:

string

Matches the corresponding buffering-start marker.

operationType:

'observation' | 'reflection'

Type of operation that was buffered.

completedAt:

string

ISO timestamp when buffering completed.

durationMs:

number

Duration in milliseconds.

tokensBuffered:

number

Message tokens (input) that were buffered.

bufferedTokens:

number

Observation tokens (output) after the Observer compressed them.

observations?:

string

The buffered content.

extractedValues?:

Record<string, unknown>

Values extracted during this buffered OM operation, keyed by extractor slug.

extractionFailures?:

Array<{ slug: string; error: string }>

Extractor failures from this buffered OM operation. Successful extractor values are still included.

recordId:

string

The OM record ID.

threadId:

string

This thread's ID.

`data-om-buffering-failed`
Direct link to data-om-buffering-failed

Emitted when async buffering fails. The system falls back to synchronous processing when the threshold is reached.

cycleId:

string

Matches the corresponding buffering-start marker.

operationType:

'observation' | 'reflection'

Type of operation that failed.

failedAt:

string

ISO timestamp when the failure occurred.

durationMs:

number

Duration until failure in milliseconds.

tokensAttempted:

number

Message tokens (input) that were attempted to buffer.

error:

string

Error message.

observations?:

string

Any partial content.

recordId:

string

The OM record ID.

threadId:

string

This thread's ID.

`data-om-activation`
Direct link to data-om-activation

Emitted when buffered observations or reflections are activated (moved into the active context window). This is an instant operation — no LLM call is involved.

cycleId:

string

Unique ID for this activation event.

operationType:

'observation' | 'reflection'

Type of content activated.

activatedAt:

string

ISO timestamp when activation occurred.

chunksActivated:

number

Number of buffered chunks activated.

tokensActivated:

number

Message tokens (input) from activated chunks. For observation activation, these are removed from the message window. For reflection activation, this is the observation tokens that were compressed.

observationTokens:

number

Resulting observation tokens after activation.

messagesActivated:

number

Number of messages that were observed via activation.

generationCount:

number

Current reflection generation count.

observations?:

string

The activated observations text.

recordId:

string

The OM record ID.

threadId:

string

This thread's ID.

config:

ObservationMarkerConfig

Snapshot of config at activation time.

Standalone usage
Direct link to Standalone usage

Most users should use the Memory class above. Using ObservationalMemory directly is mainly useful for benchmarking, experimentation, or when you need to control processor ordering with other processors (like guardrails).

src/mastra/agents/agent.ts
import { ObservationalMemory } from '@mastra/memory/processors'
import { Agent } from '@mastra/core/agent'
import { LibSQLStore } from '@mastra/libsql'

const storage = new LibSQLStore({
  id: 'my-storage',
  url: 'file:./memory.db',
})

const om = new ObservationalMemory({
  storage: storage.stores.memory,
  model: 'google/gemini-2.5-flash',
  scope: 'resource',
  observation: {
    messageTokens: 20_000,
  },
  reflection: {
    observationTokens: 60_000,
  },
})

export const agent = new Agent({
  id: 'my-agent',
  name: 'my-agent',
  instructions: 'You are a helpful assistant.',
  model: 'openai/gpt-5-mini',
  inputProcessors: [om],
  outputProcessors: [om],
})

Standalone config
Direct link to Standalone config

The standalone ObservationalMemory class accepts all the same options as the observationalMemory config object above, plus the following:

storage:

MemoryStorage

Storage adapter for persisting observations. Must be a MemoryStorage instance (from MastraStorage.stores.memory).

onDebugEvent?:

(event: ObservationDebugEvent) => void

Debug callback for observation events. Called whenever observation-related events occur. Useful for debugging and understanding the observation flow.

obscureThreadIds?:

boolean

= false

When enabled, thread IDs are hashed before being included in observation context. This prevents the LLM from recognizing patterns in thread identifiers. Automatically enabled when using resource scope through the Memory class.

Recall tool
Direct link to Recall tool

When retrieval is set (any truthy value), a recall tool is registered so the agent can page through raw messages behind observation group ranges. By default (scope 'resource'), the tool supports listing threads (mode: "threads"), browsing other threads (threadId), and cross-thread search. With retrieval: { vector: true }, semantic search is available (mode: "search"). Set scope: 'thread' to restrict the tool to the current thread only. The tool is automatically added to the agent's tool list — no manual registration is needed.

Parameters
Direct link to Parameters

mode?:

'messages' | 'threads' | 'search'

= 'messages'

What to retrieve. "messages" (default) pages through message history. "threads" lists all threads for the current user. "search" finds messages by semantic similarity across all threads (requires vector store and embedder).

query?:

string

Search query for mode: "search". Finds messages semantically similar to this text across all threads for the current user.

cursor?:

string

A message ID to anchor the recall query. Required for mode: "messages" when browsing the current thread. Extract the start or end ID from an observation group range (e.g. from _range: \startId:endId\_, use either startId or endId). If a range string is passed directly, the tool returns a hint explaining how to extract the correct ID. Can be omitted when threadId is provided to start reading from the beginning of that thread.

threadId?:

string

Browse a different thread by its ID. Use mode: "threads" first to discover thread IDs. When provided without a cursor, reading starts from the beginning of the thread.

page?:

number

= 1

Pagination offset. For messages: positive values page forward from cursor, negative values page backward. For threads: page number (0-indexed). 0 is treated as 1 for messages.

limit?:

number

= 20

Maximum number of items to return per page.

detail?:

'low' | 'high'

= 'low'

Controls how much content is shown per message part. 'low' shows truncated text and tool names with positional indices ([p0], [p1]). 'high' shows full content including tool arguments and results, clamped to one part per call with continuation hints.

partIndex?:

number

Fetch a single message part at full detail by its positional index. Use this when a low-detail recall shows an interesting part at [p1] — call again with partIndex: 1 to see the full content without loading every part.

before?:

string

For mode: "threads" only. Filter to threads created before this date. Accepts ISO 8601 format (e.g. "2026-03-15", "2026-03-10T00:00:00Z").

after?:

string

For mode: "threads" only. Filter to threads created after this date. Accepts ISO 8601 format (e.g. "2026-03-01", "2026-03-10T00:00:00Z").

Returns (messages mode)
Direct link to Returns (messages mode)

messages:

string

Formatted message content. Format depends on the detail level.

count:

number

Number of messages in this page.

cursor:

string

The cursor message ID used for this query.

page:

number

The page number returned.

limit:

number

The limit used for this query.

hasNextPage:

boolean

Whether more messages exist after this page.

hasPrevPage:

boolean

Whether more messages exist before this page.

truncated?:

boolean

Present and true when the output was capped by the token budget. The agent can paginate or use partIndex to access remaining content.

tokenOffset?:

number

Approximate number of tokens that were trimmed when truncated is true.

Returns (threads mode)
Direct link to Returns (threads mode)

threads:

string

Formatted thread listing. Each thread shows its title, ID, and dates. The current thread is marked with ← current.

count:

number

Number of threads returned.

page:

number

The page number returned.

hasMore:

boolean

Whether more threads exist on the next page.

Returns (search mode)
Direct link to Returns (search mode)

results:

string

Formatted search results grouped by thread. Each result shows the thread title, thread ID, relevance score, message preview, and a cursor ID for browsing into that thread.

count:

number

Number of matching messages found.

ModelByInputTokens
Direct link to ModelByInputTokens

ModelByInputTokens selects a model based on the input token count. It chooses the model for the smallest threshold that covers the actual input size.

Constructor
Direct link to Constructor

new ModelByInputTokens(config)

Where config is an object with upTo keys that map token thresholds (numbers) to model targets.

Example
Direct link to Example

import { ModelByInputTokens } from '@mastra/memory'

const selector = new ModelByInputTokens({
  upTo: {
    10_000: 'google/gemini-2.5-flash', // Fast for small inputs
    40_000: 'openai/gpt-5-mini', // Stronger for medium inputs
    1_000_000: 'openai/gpt-5.5', // Most capable for large inputs
  },
})

Behavior
Direct link to Behavior

Thresholds are sorted internally, so the order in the config object doesn't matter.
inputTokens ≤ smallest threshold → uses that threshold's model
inputTokens > largest threshold → resolve() throws an error. If this happens during an OM Observer or Reflector run, OM aborts via TripWire, so callers receive an empty text result or streamed tripwire instead of a normal assistant response.
OM computes the input token count for the Observer or Reflector call and resolves the matching model tier directly

Methods
Direct link to Methods

resolve:

(inputTokens: number) => MastraModelConfig

Returns the model for the given input token count. Throws if inputTokens exceeds the largest configured threshold. When this happens during an OM run, callers receive a TripWire/empty-text outcome instead of a normal assistant response.

getThresholds:

() => number[]

Returns the configured thresholds in ascending order. Useful for introspection.

UsageDirect link to Usage

ConfigurationDirect link to Configuration

enabled?:

model?:

scope?:

activateAfterIdle?:

activateOnProviderChange?:

shareTokenBudget?:

temporalMarkers?:

retrieval?:

observation?:

model?:

instruction?:

threadTitle?:

extract?:

observeAttachments?:

messageTokens?:

maxTokensPerBatch?:

modelSettings?:

temperature?:

maxOutputTokens?:

bufferTokens?:

bufferOnIdle?:

bufferActivation?:

activateAfterIdle?:

activateOnProviderChange?:

blockAfter?:

previousObserverTokens?:

reflection?:

model?:

instruction?:

extract?:

observationTokens?:

modelSettings?:

temperature?:

maxOutputTokens?:

bufferActivation?:

activateAfterIdle?:

activateOnProviderChange?:

blockAfter?:

Token estimate metadata cacheDirect link to Token estimate metadata cache

Extractor APIDirect link to Extractor API

name:

slug:

instructions:

schema?:

includePreviousExtraction?:

onExtracted?:

Extraction behaviorDirect link to Extraction behavior

ExamplesDirect link to Examples

Working memory updatesDirect link to Working memory updates

Resource scope with custom thresholds (experimental)Direct link to Resource scope with custom thresholds (experimental)

Shared token budgetDirect link to Shared token budget

Custom modelDirect link to Custom model

Different models per agentDirect link to Different models per agent

Custom instructionsDirect link to Custom instructions

Async bufferingDirect link to Async buffering

Streaming data partsDirect link to Streaming data parts

Read extractor resultsDirect link to Read extractor results

data-om-statusDirect link to data-om-status

data-om-observation-startDirect link to data-om-observation-start

cycleId:

operationType:

startedAt:

tokensToObserve:

recordId:

threadId:

threadIds:

config:

data-om-observation-endDirect link to data-om-observation-end

cycleId:

operationType:

completedAt:

durationMs:

tokensObserved:

observationTokens:

observations?:

currentTask?:

suggestedResponse?:

extractedValues?:

Usage
Direct link to Usage

Configuration
Direct link to Configuration

Token estimate metadata cache
Direct link to Token estimate metadata cache

Extractor API
Direct link to Extractor API

Extraction behavior
Direct link to Extraction behavior

Examples
Direct link to Examples

Working memory updates
Direct link to Working memory updates

Resource scope with custom thresholds (experimental)
Direct link to Resource scope with custom thresholds (experimental)

Shared token budget
Direct link to Shared token budget

Custom model
Direct link to Custom model

Different models per agent
Direct link to Different models per agent

Custom instructions
Direct link to Custom instructions

Async buffering
Direct link to Async buffering

Streaming data parts
Direct link to Streaming data parts

Read extractor results
Direct link to Read extractor results

`data-om-status`
Direct link to data-om-status

`data-om-observation-start`
Direct link to data-om-observation-start

`data-om-observation-end`
Direct link to data-om-observation-end

`data-om-observation-failed`
Direct link to data-om-observation-failed

`data-om-buffering-start`
Direct link to data-om-buffering-start

`data-om-buffering-end`
Direct link to data-om-buffering-end

`data-om-buffering-failed`
Direct link to data-om-buffering-failed

`data-om-activation`
Direct link to data-om-activation

Standalone usage
Direct link to Standalone usage

Standalone config
Direct link to Standalone config

Recall tool
Direct link to Recall tool

Parameters
Direct link to Parameters