We're excited to announce that in @mastra/core@0.14.0 we shipped output processors.
Output processors are the gatekeepers of your AI applications. They sit between your language model's response and your users
While input processors handle what goes into your LLM, output processors handle what comes out. They can catch hallucinations, redact sensitive information, enforce token limits, and ensure your AI stays on-brand.
How Output Processors Work
Output processors operate on the model's output messages in a pipeline—each processor receives the output from the previous one, in the order they're listed in your outputProcessors
array.
If any processor calls abort()
, the request terminates immediately and subsequent processors don't run. The response includes a tripwireReason
explaining why the content was blocked.
Output processors can work in two modes:
- streaming (processing chunks as they arrive)
- batch (processing the complete response).
This lets you implement real-time filtering while still doing final validation.
Out-of-the-box processors
Mastra ships a few out-of-the-box processors:
- Content moderation (ModerationProcessor)
- PII protection (PIIDetector)
- Stream optimization (
BatchPartsProcessor
) - Security (SystemPromptScrubber)
Custom Output Processors
You can also build processors tailored to your application or business. Here's a processor that ensures customer service responses stay professional:
1import type { Processor } from "@mastra/core/processors";
2import type { ChunkType } from "@mastra/core/stream";
3
4class ToneEnforcer implements Processor {
5 readonly name = "tone-enforcer";
6
7 constructor(private requiredTone: "professional" | "casual" | "technical") {}
8
9 async processOutputStream({ part, abort }) {
10 // Only check complete sentences
11 if (part.type === "text-delta" && part.payload.text.includes(".")) {
12 const tone = await this.analyzeTone(part.payload.text);
13
14 if (tone !== this.requiredTone) {
15 abort(`Response tone was ${tone}, expected ${this.requiredTone}`);
16 }
17 }
18
19 return part;
20 }
21}
Performance Considerations
Output processors add latency to every response. Here's how to minimize impact:
- Use Small Models. You're on the critical path, so we recommend starting with small, fast models unless the accuracy isn't good enough. The difference between
gpt-4o
andgpt-4.1-nano
can be 10x in latency. - Put LLM calls at the end. If you're using multiple processors, run deterministic processors before calling an LLM.
- Minimize LLM Output Tokens. Make the happy path low output.
1import { z } from "zod";
2
3// Bad: Forces verbose responses
4z.object({
5 inappropriate: z.boolean(),
6 reason: z.string(),
7 severity: z.enum(["low", "medium", "high"]),
8 categories: z.array(z.string()),
9});
10
11// Good: Allows minimal responses
12z.object({
13 severity: z.enum(["low", "medium", "high"]).optional(),
14 category: z.string().optional(), // Only if problematic
15});
A clean response is just {}
—two tokens instead of dozens.
- Push non-critical processing async. Some processors don't need to block the response. Log analytics asynchronously instead of making users wait.
Integration with Streaming
Output processors work with both regular and streaming responses:
1// Regular generation
2const result = await agent.generateVNext("Tell me about quantum computing");
3// Processors run, then you get processed result
4console.log(result.text); // Processed text
5console.log(result.object); // Structured data if applicable
6
7// Streaming
8const stream = await agent.streamVNext("Explain machine learning");
9for await (const part of stream.textStream) {
10 // Each part is already processed
11 process.stdout.write(part);
12}
For streaming, processors can work on individual chunks (processOutputStream
) or wait for the complete response (processOutputResult
). This flexibility lets you implement real-time filtering while still doing final validation.
Output processors are often needed for production AI applications. You should monitor performance cost. We'll be writing more on this shortly.