TokenLimiterProcessor
The TokenLimiterProcessor limits the number of tokens in messages. It can be used as an input, per-step input, and output processor:
- Input processor (
processInput): Filters historical messages to fit within the context window before the agentic loop starts, prioritizing recent messages - Per-step input processor (
processInputStep): Prunes messages at each step of a multi-step agent workflow, preventing unbounded token growth when tools trigger additional LLM calls - Output processor: Limits generated response tokens via streaming or non-streaming with configurable strategies for handling exceeded limits
Usage exampleDirect link to Usage example
import { TokenLimiterProcessor } from '@mastra/core/processors'
const processor = new TokenLimiterProcessor({
limit: 1000,
strategy: 'truncate',
countMode: 'cumulative',
})
Constructor parametersDirect link to Constructor parameters
options:
limit:
encoding?:
strategy?:
countMode?:
ReturnsDirect link to Returns
id:
name?:
processInput:
processInputStep:
processOutputStream:
processOutputResult:
getMaxTokens:
Error behaviorDirect link to Error behavior
When used as an input processor (both processInput and processInputStep), TokenLimiterProcessor throws a TripWire error in the following cases:
- Empty messages: If there are no messages to process, a TripWire is thrown because you can't send an LLM request with no messages.
- System messages exceed limit: If system messages alone exceed the token limit, a TripWire is thrown because you can't send an LLM request with only system messages and no user/assistant messages.
import { TripWire } from '@mastra/core/agent'
try {
await agent.generate('Hello')
} catch (error) {
if (error instanceof TripWire) {
console.log('Token limit error:', error.message)
}
}
Extended usage exampleDirect link to Extended usage example
As an input processor (limit context window)Direct link to As an input processor (limit context window)
Use inputProcessors to limit historical messages sent to the model, which helps stay within context window limits:
import { Agent } from '@mastra/core/agent'
import { Memory } from '@mastra/memory'
import { TokenLimiterProcessor } from '@mastra/core/processors'
export const agent = new Agent({
name: 'context-limited-agent',
instructions: 'You are a helpful assistant',
model: 'openai/gpt-5.4',
memory: new Memory({
/* ... */
}),
inputProcessors: [
new TokenLimiterProcessor({ limit: 4000 }), // Limits historical messages to ~4000 tokens
],
})
As a per-step input processor (limit multi-step token growth)Direct link to As a per-step input processor (limit multi-step token growth)
When an agent uses tools across multiple steps (e.g. maxSteps > 1), each step accumulates conversation history from all previous steps. Use inputProcessors to also limit tokens at each step of the agentic loop — the TokenLimiterProcessor automatically applies to both the initial input and every subsequent step:
import { Agent } from '@mastra/core/agent'
import { TokenLimiterProcessor } from '@mastra/core/processors'
export const agent = new Agent({
name: 'multi-step-agent',
instructions: 'You are a helpful research assistant with access to tools',
model: 'openai/gpt-5.4',
inputProcessors: [
new TokenLimiterProcessor({ limit: 8000 }), // Applied at every step
],
})
// Each tool call step will be limited to ~8000 input tokens
const result = await agent.generate('Research this topic using your tools', {
maxSteps: 10,
})
As an output processor (limit response length)Direct link to As an output processor (limit response length)
Use outputProcessors to limit the length of generated responses:
import { Agent } from '@mastra/core/agent'
import { TokenLimiterProcessor } from '@mastra/core/processors'
export const agent = new Agent({
name: 'response-limited-agent',
instructions: 'You are a helpful assistant',
model: 'openai/gpt-5.4',
outputProcessors: [
new TokenLimiterProcessor({
limit: 1000,
strategy: 'truncate',
countMode: 'cumulative',
}),
],
})