TokenLimiterProcessor

The TokenLimiterProcessor limits the number of tokens in messages. It can be used as both an input and output processor:

Input processor: Filters historical messages to fit within the context window, prioritizing recent messages
Output processor: Limits generated response tokens via streaming or non-streaming with configurable strategies for handling exceeded limits

Usage example
Direct link to Usage example

import { TokenLimiterProcessor } from "@mastra/core/processors";

const processor = new TokenLimiterProcessor({
  limit: 1000,
  strategy: "truncate",
  countMode: "cumulative"
});

Constructor parameters
Direct link to Constructor parameters

options:

number | Options

Either a simple number for token limit, or configuration options object

Options
Direct link to Options

limit:

number

Maximum number of tokens to allow in the response

encoding?:

TiktokenBPE

Optional encoding to use. Defaults to o200k_base which is used by gpt-5.1

strategy?:

'truncate' | 'abort'

Strategy when token limit is reached: 'truncate' stops emitting chunks, 'abort' calls abort() to stop the stream

countMode?:

'cumulative' | 'part'

Whether to count tokens from the beginning of the stream or just the current part: 'cumulative' counts all tokens from start, 'part' only counts tokens in current part

Returns
Direct link to Returns

id:

string

Processor identifier set to 'token-limiter'

name?:

string

Optional processor display name

processInput:

(args: { messages: MastraDBMessage[]; abort: (reason?: string) => never }) => Promise<MastraDBMessage[]>

Filters input messages to fit within token limit, prioritizing recent messages while preserving system messages

processOutputStream:

(args: { part: ChunkType; streamParts: ChunkType[]; state: Record<string, any>; abort: (reason?: string) => never }) => Promise<ChunkType | null>

Processes streaming output parts to limit token count during streaming

processOutputResult:

(args: { messages: MastraDBMessage[]; abort: (reason?: string) => never }) => Promise<MastraDBMessage[]>

Processes final output results to limit token count in non-streaming scenarios

getMaxTokens:

() => number

Get the maximum token limit

Extended usage example
Direct link to Extended usage example

As an input processor (limit context window)
Direct link to As an input processor (limit context window)

Use inputProcessors to limit historical messages sent to the model, which helps stay within context window limits:

src/mastra/agents/context-limited-agent.ts
import { Agent } from "@mastra/core/agent";
import { Memory } from "@mastra/memory";
import { TokenLimiterProcessor } from "@mastra/core/processors";

export const agent = new Agent({
  name: "context-limited-agent",
  instructions: "You are a helpful assistant",
  model: "openai/gpt-4o",
  memory: new Memory({ /* ... */ }),
  inputProcessors: [
    new TokenLimiterProcessor({ limit: 4000 }) // Limits historical messages to ~4000 tokens
  ]
});

As an output processor (limit response length)
Direct link to As an output processor (limit response length)

Use outputProcessors to limit the length of generated responses:

src/mastra/agents/response-limited-agent.ts
import { Agent } from "@mastra/core/agent";
import { TokenLimiterProcessor } from "@mastra/core/processors";

export const agent = new Agent({
  name: "response-limited-agent",
  instructions: "You are a helpful assistant",
  model: "openai/gpt-4o",
  outputProcessors: [
    new TokenLimiterProcessor({
      limit: 1000,
      strategy: "truncate",
      countMode: "cumulative"
    })
  ]
});

Guardrails

Usage exampleDirect link to Usage example

Constructor parametersDirect link to Constructor parameters