Skip to main content

TokenLimiterProcessor

The TokenLimiterProcessor limits the number of tokens in messages. It can be used as an input, per-step input, and output processor:

  • Input processor (processInput): Filters historical messages to fit within the context window before the agentic loop starts, prioritizing recent messages
  • Per-step input processor (processInputStep): Prunes messages at each step of a multi-step agent workflow, preventing unbounded token growth when tools trigger additional LLM calls
  • Output processor: Limits generated response tokens via streaming or non-streaming with configurable strategies for handling exceeded limits

Usage example
Direct link to Usage example

import { TokenLimiterProcessor } from '@mastra/core/processors'

const processor = new TokenLimiterProcessor({
limit: 1000,
strategy: 'truncate',
countMode: 'cumulative',
})

Constructor parameters
Direct link to Constructor parameters

options:

number | Options
Either a simple number for token limit, or configuration options object
number | Options

limit:

number
Maximum number of tokens to allow in the response

encoding?:

TiktokenBPE
Optional encoding to use. Defaults to o200k_base which is used by gpt-5.1

strategy?:

'truncate' | 'abort'
Strategy when token limit is reached: 'truncate' stops emitting chunks, 'abort' calls abort() to stop the stream

countMode?:

'cumulative' | 'part'
Whether to count tokens from the beginning of the stream or just the current part: 'cumulative' counts all tokens from start, 'part' only counts tokens in current part

Returns
Direct link to Returns

id:

string
Processor identifier set to 'token-limiter'

name?:

string
Optional processor display name

processInput:

(args: { messages: MastraDBMessage[]; abort: (reason?: string) => never }) => Promise<MastraDBMessage[]>
Filters input messages to fit within token limit before the agentic loop starts, prioritizing recent messages while preserving system messages

processInputStep:

(args: ProcessInputStepArgs) => Promise<void>
Prunes messages at each step of the agentic loop (including tool call continuations) to keep the conversation within the token limit. Mutates the messageList directly by removing oldest messages first while preserving system messages.

processOutputStream:

(args: { part: ChunkType; streamParts: ChunkType[]; state: Record<string, any>; abort: (reason?: string) => never }) => Promise<ChunkType | null>
Processes streaming output parts to limit token count during streaming

processOutputResult:

(args: { messages: MastraDBMessage[]; abort: (reason?: string) => never }) => Promise<MastraDBMessage[]>
Processes final output results to limit token count in non-streaming scenarios

getMaxTokens:

() => number
Get the maximum token limit

Error behavior
Direct link to Error behavior

When used as an input processor (both processInput and processInputStep), TokenLimiterProcessor throws a TripWire error in the following cases:

  • Empty messages: If there are no messages to process, a TripWire is thrown because you can't send an LLM request with no messages.
  • System messages exceed limit: If system messages alone exceed the token limit, a TripWire is thrown because you can't send an LLM request with only system messages and no user/assistant messages.
import { TripWire } from '@mastra/core/agent'

try {
await agent.generate('Hello')
} catch (error) {
if (error instanceof TripWire) {
console.log('Token limit error:', error.message)
}
}

Extended usage example
Direct link to Extended usage example

As an input processor (limit context window)
Direct link to As an input processor (limit context window)

Use inputProcessors to limit historical messages sent to the model, which helps stay within context window limits:

src/mastra/agents/context-limited-agent.ts
import { Agent } from '@mastra/core/agent'
import { Memory } from '@mastra/memory'
import { TokenLimiterProcessor } from '@mastra/core/processors'

export const agent = new Agent({
name: 'context-limited-agent',
instructions: 'You are a helpful assistant',
model: 'openai/gpt-5.4',
memory: new Memory({
/* ... */
}),
inputProcessors: [
new TokenLimiterProcessor({ limit: 4000 }), // Limits historical messages to ~4000 tokens
],
})

As a per-step input processor (limit multi-step token growth)
Direct link to As a per-step input processor (limit multi-step token growth)

When an agent uses tools across multiple steps (e.g. maxSteps > 1), each step accumulates conversation history from all previous steps. Use inputProcessors to also limit tokens at each step of the agentic loop — the TokenLimiterProcessor automatically applies to both the initial input and every subsequent step:

src/mastra/agents/multi-step-agent.ts
import { Agent } from '@mastra/core/agent'
import { TokenLimiterProcessor } from '@mastra/core/processors'

export const agent = new Agent({
name: 'multi-step-agent',
instructions: 'You are a helpful research assistant with access to tools',
model: 'openai/gpt-5.4',
inputProcessors: [
new TokenLimiterProcessor({ limit: 8000 }), // Applied at every step
],
})

// Each tool call step will be limited to ~8000 input tokens
const result = await agent.generate('Research this topic using your tools', {
maxSteps: 10,
})

As an output processor (limit response length)
Direct link to As an output processor (limit response length)

Use outputProcessors to limit the length of generated responses:

src/mastra/agents/response-limited-agent.ts
import { Agent } from '@mastra/core/agent'
import { TokenLimiterProcessor } from '@mastra/core/processors'

export const agent = new Agent({
name: 'response-limited-agent',
instructions: 'You are a helpful assistant',
model: 'openai/gpt-5.4',
outputProcessors: [
new TokenLimiterProcessor({
limit: 1000,
strategy: 'truncate',
countMode: 'cumulative',
}),
],
})