Memory Processors
Memory Processors allow you to modify the list of messages retrieved from memory before they are added to the agent’s context window and sent to the LLM. This is useful for managing context size, filtering content, and optimizing performance.
Processors operate on the messages retrieved based on your memory configuration (e.g., lastMessages
, semanticRecall
). They do not affect the new incoming user message.
Built-in Processors
Mastra provides built-in processors:
TokenLimiter
This processor is used to prevent errors caused by exceeding the LLM’s context window limit. It counts the tokens in the retrieved memory messages and removes the oldest messages until the total count is below the specified limit
.
import { Memory } from "@mastra/memory";
import { TokenLimiter } from "@mastra/memory/processors";
import { Agent } from "@mastra/core";
import { openai } from "@ai-sdk/openai";
const agent = new Agent({
model: openai("gpt-4o"),
memory: new Memory({
processors: [
// Ensure the total tokens from memory don't exceed ~127k
new TokenLimiter(127000),
],
}),
});
The TokenLimiter
uses the o200k_base
encoding by default (suitable for GPT-4o). You can specify other encodings if needed for different models:
// Import the encoding you need (e.g., for older OpenAI models)
import cl100k_base from "js-tiktoken/ranks/cl100k_base";
const memoryForOlderModel = new Memory({
processors: [
new TokenLimiter({
limit: 16000, // Example limit for a 16k context model
encoding: cl100k_base,
}),
],
});
See the OpenAI cookbook or js-tiktoken
repo for more on encodings.
ToolCallFilter
This processor removes tool calls from the memory messages sent to the LLM. It saves tokens by excluding potentially verbose tool interactions from the context, which is useful if the details aren’t needed for future interactions. It’s also useful if you always want your agent to call a specific tool again and not rely on previous tool results in memory.
import { Memory } from "@mastra/memory";
import { ToolCallFilter, TokenLimiter } from "@mastra/memory/processors";
const memoryFilteringTools = new Memory({
processors: [
// Example 1: Remove all tool calls/results
new ToolCallFilter(),
// Example 2: Remove only noisy image generation tool calls/results
new ToolCallFilter({ exclude: ["generateImageTool"] }),
// Always place TokenLimiter last
new TokenLimiter(127000),
],
});
Applying Multiple Processors
You can chain multiple processors. They execute in the order they appear in the processors
array. The output of one processor becomes the input for the next.
Order matters! It’s generally best practice to place TokenLimiter
last in the chain. This ensures it operates on the final set of messages after other filtering has occurred, providing the most accurate token limit enforcement.
import { Memory } from "@mastra/memory";
import { ToolCallFilter, TokenLimiter } from "@mastra/memory/processors";
// Assume a hypothetical 'PIIFilter' custom processor exists
// import { PIIFilter } from './custom-processors';
const memoryWithMultipleProcessors = new Memory({
processors: [
// 1. Filter specific tool calls first
new ToolCallFilter({ exclude: ["verboseDebugTool"] }),
// 2. Apply custom filtering (e.g., remove hypothetical PII - use with caution)
// new PIIFilter(),
// 3. Apply token limiting as the final step
new TokenLimiter(127000),
],
});
Creating Custom Processors
You can create custom logic by extending the base MemoryProcessor
class.
import { Memory, CoreMessage } from "@mastra/memory";
import { MemoryProcessor, MemoryProcessorOpts } from "@mastra/core/memory";
class ConversationOnlyFilter extends MemoryProcessor {
constructor() {
// Provide a name for easier debugging if needed
super({ name: "ConversationOnlyFilter" });
}
process(
messages: CoreMessage[],
_opts: MemoryProcessorOpts = {}, // Options passed during memory retrieval, rarely needed here
): CoreMessage[] {
// Filter messages based on role
return messages.filter(
(msg) => msg.role === "user" || msg.role === "assistant",
);
}
}
// Use the custom processor
const memoryWithCustomFilter = new Memory({
processors: [
new ConversationOnlyFilter(),
new TokenLimiter(127000), // Still apply token limiting
],
});
When creating custom processors avoid mutating the input messages
array or its objects directly.