Skip to main content
Mastra 1.0 is available 🎉 Read announcement

SemanticRecall

The SemanticRecall is a hybrid processor that enables semantic search over conversation history using vector embeddings. On input, it performs semantic search to find relevant historical messages. On output, it creates embeddings for new messages to enable future semantic retrieval.

Usage example
Direct link to Usage example

import { SemanticRecall } from "@mastra/core/processors";
import { openai } from "@ai-sdk/openai";

const processor = new SemanticRecall({
storage: memoryStorage,
vector: vectorStore,
embedder: openai.embedding("text-embedding-3-small"),
topK: 5,
messageRange: 2,
scope: "resource",
});

Constructor parameters
Direct link to Constructor parameters

options:

SemanticRecallOptions
Configuration options for the semantic recall processor

Options
Direct link to Options

storage:

MemoryStorage
Storage instance for retrieving messages

vector:

MastraVector
Vector store for semantic search

embedder:

MastraEmbeddingModel<string>
Embedder for generating query embeddings

topK?:

number
Number of most similar messages to retrieve

messageRange?:

number | { before: number; after: number }
Number of context messages to include before/after each match. Can be a single number (same for both) or an object with separate values

scope?:

'thread' | 'resource'
Scope of semantic search. 'thread' searches within the current thread only. 'resource' searches across all threads for the resource

threshold?:

number
Minimum similarity score threshold (0-1). Messages below this threshold are filtered out

indexName?:

string
Index name for the vector store. If not provided, auto-generated based on embedder model

logger?:

IMastraLogger
Optional logger instance for structured logging

Returns
Direct link to Returns

id:

string
Processor identifier set to 'semantic-recall'

name:

string
Processor display name set to 'SemanticRecall'

processInput:

(args: { messages: MastraDBMessage[]; messageList: MessageList; abort: (reason?: string) => never; tracingContext?: TracingContext; requestContext?: RequestContext }) => Promise<MessageList | MastraDBMessage[]>
Performs semantic search on historical messages and adds relevant context to the message list

processOutputResult:

(args: { messages: MastraDBMessage[]; messageList?: MessageList; abort: (reason?: string) => never; tracingContext?: TracingContext; requestContext?: RequestContext }) => Promise<MessageList | MastraDBMessage[]>
Creates embeddings for new messages to enable future semantic search

Extended usage example
Direct link to Extended usage example

src/mastra/agents/semantic-memory-agent.ts
import { Agent } from "@mastra/core/agent";
import { SemanticRecall, MessageHistory } from "@mastra/core/processors";
import { PostgresStorage } from "@mastra/pg";
import { PgVector } from "@mastra/pg";
import { openai } from "@ai-sdk/openai";

const storage = new PostgresStorage({
id: 'pg-storage',
connectionString: process.env.DATABASE_URL,
});

const vector = new PgVector({
id: 'pg-vector',
connectionString: process.env.DATABASE_URL,
});

const semanticRecall = new SemanticRecall({
storage,
vector,
embedder: openai.embedding("text-embedding-3-small"),
topK: 5,
messageRange: { before: 2, after: 1 },
scope: "resource",
threshold: 0.7,
});

export const agent = new Agent({
name: "semantic-memory-agent",
instructions: "You are a helpful assistant with semantic memory recall",
model: "openai:gpt-4o",
inputProcessors: [
semanticRecall,
new MessageHistory({ storage, lastMessages: 50 }),
],
outputProcessors: [
semanticRecall,
new MessageHistory({ storage }),
],
});

Behavior
Direct link to Behavior

Input processing
Direct link to Input processing

  1. Extracts the user query from the last user message
  2. Generates embeddings for the query
  3. Performs vector search to find semantically similar messages
  4. Retrieves matched messages along with surrounding context (based on messageRange)
  5. For scope: 'resource', formats cross-thread messages as a system message with timestamps
  6. Adds recalled messages with source: 'memory' tag

Output processing
Direct link to Output processing

  1. Extracts text content from new user and assistant messages
  2. Generates embeddings for each message
  3. Stores embeddings in the vector store with metadata (message ID, thread ID, resource ID, role, content, timestamp)
  4. Uses LRU caching for embeddings to avoid redundant API calls

Cross-thread recall
Direct link to Cross-thread recall

When scope is set to 'resource', the processor can recall messages from other threads. These cross-thread messages are formatted as a system message with timestamps and conversation labels to provide context about when and where the conversation occurred.