Skip to main content

UnicodeNormalizer

The UnicodeNormalizer is an input processor that normalizes Unicode text to ensure consistent formatting and remove potentially problematic characters before messages are sent to the language model. This processor helps maintain text quality by handling various Unicode representations, removing control characters, and standardizing whitespace formatting.

Usage example
Direct link to Usage example

import { UnicodeNormalizer } from "@mastra/core/processors";

const processor = new UnicodeNormalizer({
stripControlChars: true,
collapseWhitespace: true
});

Constructor parameters
Direct link to Constructor parameters

options?:

Options
Configuration options for Unicode text normalization

Options
Direct link to Options

stripControlChars?:

boolean
Whether to strip control characters. When enabled, removes control characters except , ,

preserveEmojis?:

boolean
Whether to preserve emojis. When disabled, emojis may be removed if they contain control characters

collapseWhitespace?:

boolean
Whether to collapse consecutive whitespace. When enabled, multiple spaces/tabs/newlines are collapsed to single instances

trim?:

boolean
Whether to trim leading and trailing whitespace

Returns
Direct link to Returns

id:

string
Processor identifier set to 'unicode-normalizer'

name?:

string
Optional processor display name

processInput:

(args: { messages: MastraMessageV2[]; abort: (reason?: string) => never }) => MastraMessageV2[]
Processes input messages to normalize Unicode text

Extended usage example
Direct link to Extended usage example

src/mastra/agents/normalized-agent.ts
import { Agent } from "@mastra/core/agent";
import { UnicodeNormalizer } from "@mastra/core/processors";

export const agent = new Agent({
name: "normalized-agent",
instructions: "You are a helpful assistant",
model: "openai/gpt-5.1",
inputProcessors: [
new UnicodeNormalizer({
stripControlChars: true,
preserveEmojis: true,
collapseWhitespace: true,
trim: true
})
]
});

On this page