Introducing Output Processors

We're excited to announce that in @mastra/core@0.14.0 we shipped output processors.

Output processors are the gatekeepers of your AI applications. They sit between your language model's response and your users

Output Processors

While input processors handle what goes into your LLM, output processors handle what comes out. They can catch hallucinations, redact sensitive information, enforce token limits, and ensure your AI stays on-brand.

How Output Processors Work

Output processors operate on the model's output messages in a pipeline—each processor receives the output from the previous one, in the order they're listed in your outputProcessors array.

If any processor calls abort(), the request terminates immediately and subsequent processors don't run. The response includes a tripwireReason explaining why the content was blocked.

Output processors can work in two modes:

streaming (processing chunks as they arrive)
batch (processing the complete response).

This lets you implement real-time filtering while still doing final validation.

Out-of-the-box processors

Mastra ships a few out-of-the-box processors:

Content moderation (ModerationProcessor)
PII protection (PIIDetector)
Stream optimization (BatchPartsProcessor)
Security (SystemPromptScrubber)

Custom Output Processors

You can also build processors tailored to your application or business. Here's a processor that ensures customer service responses stay professional:

 1import type { Processor } from "@mastra/core/processors";
 2import type { ChunkType } from "@mastra/core/stream";
 3
 4class ToneEnforcer implements Processor {
 5  readonly name = "tone-enforcer";
 6
 7  constructor(private requiredTone: "professional" | "casual" | "technical") {}
 8
 9  async processOutputStream({ part, abort }) {
10    // Only check complete sentences
11    if (part.type === "text-delta" && part.payload.text.includes(".")) {
12      const tone = await this.analyzeTone(part.payload.text);
13
14      if (tone !== this.requiredTone) {
15        abort(`Response tone was ${tone}, expected ${this.requiredTone}`);
16      }
17    }
18
19    return part;
20  }
21}

Performance Considerations

Output processors add latency to every response. Here's how to minimize impact:

Use Small Models. You're on the critical path, so we recommend starting with small, fast models unless the accuracy isn't good enough. The difference between gpt-4o and gpt-4.1-nano can be 10x in latency.
Put LLM calls at the end. If you're using multiple processors, run deterministic processors before calling an LLM.
Minimize LLM Output Tokens. Make the happy path low output.

 1import { z } from "zod";
 2
 3// Bad: Forces verbose responses
 4z.object({
 5  inappropriate: z.boolean(),
 6  reason: z.string(),
 7  severity: z.enum(["low", "medium", "high"]),
 8  categories: z.array(z.string()),
 9});
10
11// Good: Allows minimal responses
12z.object({
13  severity: z.enum(["low", "medium", "high"]).optional(),
14  category: z.string().optional(), // Only if problematic
15});

A clean response is just {}—two tokens instead of dozens.

Push non-critical processing async. Some processors don't need to block the response. Log analytics asynchronously instead of making users wait.

Integration with Streaming

Output processors work with both regular and streaming responses:

 1// Regular generation
 2const result = await agent.generateVNext("Tell me about quantum computing");
 3// Processors run, then you get processed result
 4console.log(result.text); // Processed text
 5console.log(result.object); // Structured data if applicable
 6
 7// Streaming
 8const stream = await agent.streamVNext("Explain machine learning");
 9for await (const part of stream.textStream) {
10  // Each part is already processed
11  process.stdout.write(part);
12}

For streaming, processors can work on individual chunks (processOutputStream) or wait for the complete response (processOutputResult). This flexibility lets you implement real-time filtering while still doing final validation.

Output processors are often needed for production AI applications. You should monitor performance cost. We'll be writing more on this shortly.