UnicodeNormalizer
The UnicodeNormalizer is an input processor that normalizes Unicode text to ensure consistent formatting and remove potentially problematic characters before messages are sent to the language model. This processor helps maintain text quality by handling various Unicode representations, removing control characters, and standardizing whitespace formatting.
Usage exampleDirect link to Usage example
import { UnicodeNormalizer } from "@mastra/core/processors";
const processor = new UnicodeNormalizer({
stripControlChars: true,
collapseWhitespace: true
});
Constructor parametersDirect link to Constructor parameters
options?:
Options
Configuration options for Unicode text normalization
OptionsDirect link to Options
stripControlChars?:
boolean
Whether to strip control characters. When enabled, removes control characters except ,
,
preserveEmojis?:
boolean
Whether to preserve emojis. When disabled, emojis may be removed if they contain control characters
collapseWhitespace?:
boolean
Whether to collapse consecutive whitespace. When enabled, multiple spaces/tabs/newlines are collapsed to single instances
trim?:
boolean
Whether to trim leading and trailing whitespace
ReturnsDirect link to Returns
name:
string
Processor name set to 'unicode-normalizer'
processInput:
(args: { messages: MastraMessageV2[]; abort: (reason?: string) => never }) => MastraMessageV2[]
Processes input messages to normalize Unicode text
Extended usage exampleDirect link to Extended usage example
src/mastra/agents/normalized-agent.ts
import { Agent } from "@mastra/core/agent";
import { UnicodeNormalizer } from "@mastra/core/processors";
export const agent = new Agent({
name: "normalized-agent",
instructions: "You are a helpful assistant",
model: "openai/gpt-4o-mini",
inputProcessors: [
new UnicodeNormalizer({
stripControlChars: true,
preserveEmojis: true,
collapseWhitespace: true,
trim: true
})
]
});