Recalling Relevant History
If you ask your friend what they did last weekend, they will search in their memory for events associated with “last weekend” and then tell you what they did. That’s sort of like how semantic recall works in Mastra.
How Semantic Recall Works
Semantic recall is RAG-based search that helps agents maintain context across longer interactions when messages are no longer within recent conversation history.
It uses vector embeddings of messages for similarity search, integrates with various vector stores, and has configurable context windows around retrieved messages.

Quick Start
Semantic recall is enabled by default, so if you give your agent memory it will be included:
import { Agent } from "@mastra/core/agent";
import { Memory } from "@mastra/memory";
import { openai } from "@ai-sdk/openai";
const agent = new Agent({
name: "SupportAgent",
instructions: "You are a helpful support agent.",
model: openai("gpt-4o"),
memory: new Memory(),
});
Recall configuration
The two main parameters that control semantic recall behavior are:
- topK: How many semantically similar messages to retrieve
- messageRange: How much surrounding context to include with each match
const agent = new Agent({
memory: new Memory({
options: {
semanticRecall: {
topK: 3, // Retrieve 3 most similar messages
messageRange: 2, // Include 2 messages before and after each match
},
},
}),
});
Storage configuration
Semantic recall relies on a storage and vector db to store messages and their embeddings.
import { Memory } from "@mastra/memory";
import { Agent } from "@mastra/core/agent";
import { LibSQLStore } from "@mastra/core/storage/libsql";
import { LibSQLVector } from "@mastra/core/vector/libsql";
const agent = new Agent({
memory: new Memory({
// this is the default storage db if omitted
storage: new LibSQLStore({
config: {
url: "file:local.db",
},
}),
// this is the default vector db if omitted
vector: new LibSQLVector({
connectionUrl: "file:local.db",
}),
}),
});
Storage/vector code Examples:
Embedder configuration
Semantic recall relies an embedding model to convert messages into embeddings. By default Mastra uses FastEmbed but you can specify another embedding model .
import { Memory } from "@mastra/memory";
import { Agent } from "@mastra/core/agent";
import { openai } from "@ai-sdk/openai";
const agent = new Agent({
memory: new Memory({
embedder: openai.embedding("text-embedding-3-small"),
}),
});
Disabling
There is a performance impact to using semantic recall. New messages are converted into embeddings and used to query a vector database before new messages are sent to the LLM.
Semantic recall is enabled by default but can be disabled when not needed:
const agent = new Agent({
memory: new Memory({
options: {
semanticRecall: false,
},
}),
});
You might want to disable semantic recall in scenarios like:
- When conversation history provide sufficient context for the current conversation.
- In performance-sensitive applications, like realtime two-way audio, where the added latency of creating embeddings and running vector queries is noticeable.