# AWS Nova Sonic voice The `NovaSonicVoice` class provides real-time speech-to-speech capabilities backed by [AWS Bedrock Nova 2 Sonic](https://docs.aws.amazon.com/nova/latest/userguide/speech.html). It opens a bidirectional stream to the model and emits events for assistant audio, transcribed text, tool calls, turn boundaries, and interruptions. ## Usage example ```typescript import { NovaSonicVoice } from '@mastra/voice-aws-nova-sonic' import { playAudio, getMicrophoneStream } from '@mastra/node-audio' // Initialize using the default AWS credential provider chain const voice = new NovaSonicVoice({ region: 'us-east-1', speaker: 'matthew', }) // Or pass explicit credentials const voiceWithCredentials = new NovaSonicVoice({ region: 'us-east-1', speaker: 'tiffany', credentials: { accessKeyId: process.env.AWS_ACCESS_KEY_ID!, secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!, }, }) // Establish the bidirectional stream await voice.connect() // Listen for assistant audio (Int16Array PCM) voice.on('speaking', ({ audioData }) => { if (audioData) playAudio(audioData) }) // Listen for transcribed text from the user and assistant voice.on('writing', ({ text, role, generationStage }) => { console.log(`${role} (${generationStage ?? 'FINAL'}): ${text}`) }) // Stream microphone audio in real time const microphoneStream = getMicrophoneStream() await voice.send(microphoneStream) // Disconnect when done voice.close() ``` ## Authentication `NovaSonicVoice` uses the AWS SDK credential resolution chain when no `credentials` option is passed. Mastra calls `defaultProvider()` from `@aws-sdk/credential-provider-node`, which checks (in order) environment variables, shared credentials files, IAM role for EC2, ECS, EKS, and other standard sources. To use static credentials, pass them on the constructor: ```typescript new NovaSonicVoice({ region: 'us-east-1', credentials: { accessKeyId: process.env.AWS_ACCESS_KEY_ID!, secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!, sessionToken: process.env.AWS_SESSION_TOKEN, }, }) ``` The voice provider never logs credential values. ## Configuration ### Constructor options **region** (`'us-east-1' | 'us-west-2' | 'ap-northeast-1'`): AWS region that hosts the Nova Sonic model. (Default: `'us-east-1'`) **model** (`string`): Bedrock model ID for the bidirectional stream. (Default: `'amazon.nova-2-sonic-v1:0'`) **credentials** (`AwsCredentialIdentity`): Static AWS credentials. When omitted the default AWS credential provider chain is used. **speaker** (`string | NovaSonicVoiceConfigDetails`): Default voice for the assistant. Pass a voice ID string such as 'matthew' or an object that includes a language code and gender. (Default: `'matthew'`) **languageCode** (`NovaSonicLanguageCode`): Language code used for the session. Polyglot voices support all listed languages. **instructions** (`string`): System prompt sent at session start. Equivalent to calling addInstructions() before connect(). **tools** (`NovaSonicToolConfig[]`): Tools exposed to the model. When the voice instance is attached to an Agent, the Agent's tools are added automatically. **sessionConfig** (`NovaSonicSessionConfig`): Inference, turn-detection, and tool-choice configuration. See Session configuration below. **debug** (`boolean`): Enable verbose logging for stream events. Sensitive fields are masked. (Default: `false`) ### Session configuration `sessionConfig` controls inference parameters and turn-taking behavior. All fields are optional. **inferenceConfiguration** (`object`): Sampling and decoding parameters. **inferenceConfiguration.maxTokens** (`number`): Maximum tokens generated per turn. **inferenceConfiguration.temperature** (`number`): Sampling temperature. **inferenceConfiguration.topP** (`number`): Nucleus sampling probability. **inferenceConfiguration.topK** (`number`): Top-k sampling. **inferenceConfiguration.stopSequences** (`string[]`): Sequences that end generation. **turnDetectionConfiguration** (`object`): Endpointing sensitivity for turn detection. **turnDetectionConfiguration.endpointingSensitivity** (`'HIGH' | 'MEDIUM' | 'LOW'`): Pause duration before the model considers a turn complete. HIGH ends turns fastest (about 1.5s pause), MEDIUM is balanced (about 1.75s), LOW waits longest (about 2s). **toolChoice** (`'auto' | 'any' | { tool: { name: string } }`): How the model decides whether to call a tool. **enableKnowledgeGrounding** (`boolean`): Enable retrieval-augmented grounding against a Bedrock knowledge base. **knowledgeBaseConfig** (`{ knowledgeBaseId?: string; dataSourceId?: string }`): Knowledge base used when knowledge grounding is enabled. ## Methods ### `connect()` Opens the bidirectional stream to AWS Bedrock and sends the initial session, prompt, and system events. Call this before `speak`, `listen`, or `send`. **options** (`{ requestContext?: RequestContext }`): Optional request context propagated to tool calls made during the session. Returns: `Promise` ### `speak()` Synthesizes speech for a text prompt and emits `speaking` events as audio is produced. **input** (`string | NodeJS.ReadableStream`): Text or text stream to synthesize. **options** (`NovaSonicVoiceOptions`): Per-call overrides such as the speaker or language code. Returns: `Promise` ### `send()` Streams microphone audio (or any PCM source) to the model. Use this for live, continuous conversation. **audioData** (`NodeJS.ReadableStream | Int16Array`): 16-bit PCM audio to forward to the model. Returns: `Promise` ### `listen()` Convenience wrapper that delegates to `send()`. Use it when you want a single transcription pass over a finite audio stream. **audioData** (`NodeJS.ReadableStream`): Audio stream to transcribe. Returns: `Promise` ### `endAudioInput()` Signals the end of the current audio turn so the model can finalize its response. Call this when the user stops speaking and the provider is not configured for server-side turn detection. Returns: `Promise` ### `addInstructions()` Updates the system prompt for the active session. **instructions** (`string`): System prompt to apply to the session. Returns: `void` ### `addTools()` Registers tools with the voice instance. When `NovaSonicVoice` is attached to an Agent, the Agent's tools are added automatically. **tools** (`ToolsInput`): Tools exposed to the model. Returns: `void` ### `getSpeakers()` Returns the list of voices supported by Nova 2 Sonic. Returns: `Promise>` ### `getListener()` Returns whether the voice instance currently holds an open stream. Returns: `Promise<{ enabled: boolean }>` ### `close()` Closes the bidirectional stream and destroys the underlying Bedrock client. Call this when the conversation ends. Returns: `void` ### `on()` / `off()` Registers and removes event listeners. See [Voice events](https://mastra.ai/reference/voice/voice.events) for the shared event API. ## Events `NovaSonicVoice` emits the following events: **speaking** (`event`): Assistant audio chunk. Callback receives { audioData: Int16Array, sampleRate?: number }. **writing** (`event`): Transcribed text from the user or assistant. Callback receives { text: string, role: 'assistant' | 'user', generationStage?: 'SPECULATIVE' | 'FINAL' }. **toolCall** (`event`): Model requested a tool call. Callback receives { name: string, args: Record\, id: string }. **interrupt** (`event`): User or model interrupted the current turn. Callback receives { type: 'user' | 'model', timestamp: number }. **turnComplete** (`event`): Model finished its turn. Callback receives { timestamp: number }. **session** (`event`): Session state transition. Callback receives { state: 'connecting' | 'connected' | 'disconnected' | 'disconnecting' | 'error' }. **usage** (`event`): Token usage for the turn. Callback receives { inputTokens: number, outputTokens: number, totalTokens: number }. **error** (`event`): Stream or provider error. Callback receives { message: string, code?: string, details?: unknown }. `generationStage` distinguishes provisional transcripts (`'SPECULATIVE'`) from finalized ones (`'FINAL'`). Use `'FINAL'` text for persistent storage and `'SPECULATIVE'` text for live captions. ## Available voices Nova 2 Sonic ships voices in ten locales. Tiffany and Matthew are polyglot and can speak any supported language. | Voice ID | Name | Language | Locale | Gender | Polyglot | | ---------- | -------- | ---------- | ------ | --------- | -------- | | `tiffany` | Tiffany | English | en-US | feminine | yes | | `matthew` | Matthew | English | en-US | masculine | yes | | `amy` | Amy | English | en-GB | feminine | no | | `olivia` | Olivia | English | en-AU | feminine | no | | `kiara` | Kiara | English | en-IN | feminine | no | | `arjun` | Arjun | English | en-IN | masculine | no | | `ambre` | Ambre | French | fr-FR | feminine | no | | `florian` | Florian | French | fr-FR | masculine | no | | `beatrice` | Beatrice | Italian | it-IT | feminine | no | | `lorenzo` | Lorenzo | Italian | it-IT | masculine | no | | `tina` | Tina | German | de-DE | feminine | no | | `lennart` | Lennart | German | de-DE | masculine | no | | `lupe` | Lupe | Spanish | es-US | feminine | no | | `carlos` | Carlos | Spanish | es-US | masculine | no | | `carolina` | Carolina | Portuguese | pt-BR | feminine | no | | `leo` | Leo | Portuguese | pt-BR | masculine | no | | `kiara` | Kiara | Hindi | hi-IN | feminine | no | | `arjun` | Arjun | Hindi | hi-IN | masculine | no | ## Notes - Audio is streamed as 16-bit PCM. Assistant audio is emitted as `Int16Array` on the `speaking` event. - The voice instance must call `connect()` before any other streaming method. - `close()` destroys the underlying `BedrockRuntimeClient` to release the HTTP/2 session. - Nova 2 Sonic is available in `us-east-1`, `us-west-2`, and `ap-northeast-1`. Other regions throw a configuration error during construction.