# TokenLimiterProcessor The `TokenLimiterProcessor` limits the number of tokens in messages. It can be used as both an input and output processor: - **Input processor**: Filters historical messages to fit within the context window, prioritizing recent messages - **Output processor**: Limits generated response tokens via streaming or non-streaming with configurable strategies for handling exceeded limits ## Usage example ```typescript import { TokenLimiterProcessor } from "@mastra/core/processors"; const processor = new TokenLimiterProcessor({ limit: 1000, strategy: "truncate", countMode: "cumulative" }); ``` ## Constructor parameters **options:** (`number | Options`): Either a simple number for token limit, or configuration options object ### Options **limit:** (`number`): Maximum number of tokens to allow in the response **encoding?:** (`TiktokenBPE`): Optional encoding to use. Defaults to o200k\_base which is used by gpt-5.1 **strategy?:** (`'truncate' | 'abort'`): Strategy when token limit is reached: 'truncate' stops emitting chunks, 'abort' calls abort() to stop the stream **countMode?:** (`'cumulative' | 'part'`): Whether to count tokens from the beginning of the stream or just the current part: 'cumulative' counts all tokens from start, 'part' only counts tokens in current part ## Returns **id:** (`string`): Processor identifier set to 'token-limiter' **name?:** (`string`): Optional processor display name **processInput:** (`(args: { messages: MastraDBMessage[]; abort: (reason?: string) => never }) => Promise`): Filters input messages to fit within token limit, prioritizing recent messages while preserving system messages **processOutputStream:** (`(args: { part: ChunkType; streamParts: ChunkType[]; state: Record; abort: (reason?: string) => never }) => Promise`): Processes streaming output parts to limit token count during streaming **processOutputResult:** (`(args: { messages: MastraDBMessage[]; abort: (reason?: string) => never }) => Promise`): Processes final output results to limit token count in non-streaming scenarios **getMaxTokens:** (`() => number`): Get the maximum token limit ## Error behavior When used as an input processor, `TokenLimiterProcessor` throws a `TripWire` error in the following cases: - **Empty messages**: If there are no messages to process, a TripWire is thrown because you cannot send an LLM request with no messages. - **System messages exceed limit**: If system messages alone exceed the token limit, a TripWire is thrown because you cannot send an LLM request with only system messages and no user/assistant messages. ```typescript import { TripWire } from "@mastra/core/agent"; try { await agent.generate("Hello"); } catch (error) { if (error instanceof TripWire) { console.log("Token limit error:", error.message); } } ``` ## Extended usage example ### As an input processor (limit context window) Use `inputProcessors` to limit historical messages sent to the model, which helps stay within context window limits: ```typescript import { Agent } from "@mastra/core/agent"; import { Memory } from "@mastra/memory"; import { TokenLimiterProcessor } from "@mastra/core/processors"; export const agent = new Agent({ name: "context-limited-agent", instructions: "You are a helpful assistant", model: "openai/gpt-4o", memory: new Memory({ /* ... */ }), inputProcessors: [ new TokenLimiterProcessor({ limit: 4000 }) // Limits historical messages to ~4000 tokens ] }); ``` ### As an output processor (limit response length) Use `outputProcessors` to limit the length of generated responses: ```typescript import { Agent } from "@mastra/core/agent"; import { TokenLimiterProcessor } from "@mastra/core/processors"; export const agent = new Agent({ name: "response-limited-agent", instructions: "You are a helpful assistant", model: "openai/gpt-4o", outputProcessors: [ new TokenLimiterProcessor({ limit: 1000, strategy: "truncate", countMode: "cumulative" }) ] }); ``` ## Related - [Guardrails](https://mastra.ai/docs/agents/guardrails)