To work effectively, AI agents need the right context from previous interactions. But the right context is often hard to define programmatically! This is where agent memory comes in.
Implementing a memory system for agents usually involves:
- Persisting memory in backend storage
- Storing and retrieving conversation history
- Finding relevant context from past interactions using semantic search
So we built a Memory API for agents in Mastra. This includes:
- Conversation context management
- Semantic search over past interactions
- Storage backend abstraction
- Thread sharing between agents
This post will walk you through implementing these capabilities in your application, from local development to production deployment.
Implementation Guide
Let's build an agent with memory capabilities. Mastra's Memory API comes with sensible defaults that work out of the box, while still being configurable when needed.
Prerequisites:
npm install @mastra/core @mastra/memory
Setting Up Memory
The default configuration initializes a LibSQL database for memory retrieval, storage, and semantic search and uses the fastembed-js
library to download a local model for on-device embedding.
Because of this, you can initialize a Memory instance without any configuration:
import { Memory } from "@mastra/memory";
const memory = new Memory();
Now let's create an agent and attach memory to it:
import { Agent } from "@mastra/core";
import { Memory } from "@mastra/memory";
const memory = new Memory();
const agent = new Agent({
memory,
// Additional agent configuration options
});
That's it! You've built an agent with a memory system that includes:
- Persistent storage via LibSQL
- Built-in vector search and embedding capabilities via
fastembed-js
- Automatic context management
For custom storage backends, vector databases, or embedding models, check out the Memory Configuration Guide.
Configure Memory Parameters
Here's a reasonable set of parameters for message retrieval and search that you might give an agent. It includes both a number of recent messages, as well as older messages that are semantically similar, as well as the context surrounding those messages.
const memory = new Memory({
options: {
lastMessages: 100, // Number of recent messages to include
semanticRecall: {
topK: 2, // Number of similar messages to retrieve
messageRange: 2, // Number of messages before/after each result
},
},
});
Advanced Memory Usage
Using Resource and Threads
Note that when making requests, you can specify some global parameters for the agent, if you want to attach it onto resources or threads that are available to other agents:
resourceId
: Identifier for the user/entity making the requestthreadId
: Identifier for the conversation thread
await agent.stream("Message text", {
resourceId: "user_123",
threadId: "thread_123",
});
Overriding Memory Parameters
While an agent has a default memory configuration, you can override those parameters for an individual request:
await agent.stream("Message text", {
memoryOptions: {
lastMessages: 10,
semanticRecall: {
topK: 3,
messageRange: 5,
},
},
});
Summary
The Memory API implements these core functions:
- Message Storage: Stores messages and metadata in SQL tables, indexed by thread and resource IDs
- Vector Search: Converts messages to embeddings for similarity search
- Context Management: Retrieves recent messages and semantically similar historical context
- Thread Sharing: Enables multiple agents to access shared conversation threads