AWS Nova Sonic voice

The NovaSonicVoice class provides real-time speech-to-speech capabilities backed by AWS Bedrock Nova 2 Sonic. It opens a bidirectional stream to the model and emits events for assistant audio, transcribed text, tool calls, turn boundaries, and interruptions.

Usage example
Direct link to Usage example

src/mastra/voice.ts
import { NovaSonicVoice } from '@mastra/voice-aws-nova-sonic'
import { playAudio, getMicrophoneStream } from '@mastra/node-audio'

// Initialize using the default AWS credential provider chain
const voice = new NovaSonicVoice({
  region: 'us-east-1',
  speaker: 'matthew',
})

// Or pass explicit credentials
const voiceWithCredentials = new NovaSonicVoice({
  region: 'us-east-1',
  speaker: 'tiffany',
  credentials: {
    accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
  },
})

// Establish the bidirectional stream
await voice.connect()

// Listen for assistant audio (Int16Array PCM)
voice.on('speaking', ({ audioData }) => {
  if (audioData) playAudio(audioData)
})

// Listen for transcribed text from the user and assistant
voice.on('writing', ({ text, role, generationStage }) => {
  console.log(`${role} (${generationStage ?? 'FINAL'}): ${text}`)
})

// Stream microphone audio in real time
const microphoneStream = getMicrophoneStream()
await voice.send(microphoneStream)

// Disconnect when done
voice.close()

Authentication
Direct link to Authentication

NovaSonicVoice uses the AWS SDK credential resolution chain when no credentials option is passed. Mastra calls defaultProvider() from @aws-sdk/credential-provider-node, which checks (in order) environment variables, shared credentials files, IAM role for EC2, ECS, EKS, and other standard sources.

To use static credentials, pass them on the constructor:

new NovaSonicVoice({
  region: 'us-east-1',
  credentials: {
    accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
    secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
    sessionToken: process.env.AWS_SESSION_TOKEN,
  },
})

The voice provider never logs credential values.

Configuration
Direct link to Configuration

Constructor options
Direct link to Constructor options

region?:

'us-east-1' | 'us-west-2' | 'ap-northeast-1'

= 'us-east-1'

AWS region that hosts the Nova Sonic model.

model?:

string

= 'amazon.nova-2-sonic-v1:0'

Bedrock model ID for the bidirectional stream.

credentials?:

AwsCredentialIdentity

Static AWS credentials. When omitted the default AWS credential provider chain is used.

speaker?:

string | NovaSonicVoiceConfigDetails

= 'matthew'

Default voice for the assistant. Pass a voice ID string such as 'matthew' or an object that includes a language code and gender.

languageCode?:

NovaSonicLanguageCode

Language code used for the session. Polyglot voices support all listed languages.

instructions?:

string

System prompt sent at session start. Equivalent to calling addInstructions() before connect().

tools?:

NovaSonicToolConfig[]

Tools exposed to the model. When the voice instance is attached to an Agent, the Agent's tools are added automatically.

sessionConfig?:

NovaSonicSessionConfig

Inference, turn-detection, and tool-choice configuration. See Session configuration below.

debug?:

boolean

= false

Enable verbose logging for stream events. Sensitive fields are masked.

Session configuration
Direct link to Session configuration

sessionConfig controls inference parameters and turn-taking behavior. All fields are optional.

inferenceConfiguration?:

object

Sampling and decoding parameters.

object

maxTokens?:

number

Maximum tokens generated per turn.

temperature?:

number

Sampling temperature.

topP?:

number

Nucleus sampling probability.

topK?:

number

Top-k sampling.

stopSequences?:

string[]

Sequences that end generation.

turnDetectionConfiguration?:

object

Endpointing sensitivity for turn detection.

object

endpointingSensitivity?:

'HIGH' | 'MEDIUM' | 'LOW'

Pause duration before the model considers a turn complete. HIGH ends turns fastest (about 1.5s pause), MEDIUM is balanced (about 1.75s), LOW waits longest (about 2s).

toolChoice?:

'auto' | 'any' | { tool: { name: string } }

How the model decides whether to call a tool.

enableKnowledgeGrounding?:

boolean

Enable retrieval-augmented grounding against a Bedrock knowledge base.

knowledgeBaseConfig?:

{ knowledgeBaseId?: string; dataSourceId?: string }

Knowledge base used when knowledge grounding is enabled.

Methods
Direct link to Methods

`connect()`
Direct link to connect

Opens the bidirectional stream to AWS Bedrock and sends the initial session, prompt, and system events. Call this before speak, listen, or send.

options?:

{ requestContext?: RequestContext }

Optional request context propagated to tool calls made during the session.

Returns: Promise<void>

`speak()`
Direct link to speak

Synthesizes speech for a text prompt and emits speaking events as audio is produced.

input:

string | NodeJS.ReadableStream

Text or text stream to synthesize.

options?:

NovaSonicVoiceOptions

Per-call overrides such as the speaker or language code.

Returns: Promise<void>

`send()`
Direct link to send

Streams microphone audio (or any PCM source) to the model. Use this for live, continuous conversation.

audioData:

NodeJS.ReadableStream | Int16Array

16-bit PCM audio to forward to the model.

Returns: Promise<void>

`listen()`
Direct link to listen

Convenience wrapper that delegates to send(). Use it when you want a single transcription pass over a finite audio stream.

audioData:

NodeJS.ReadableStream

Audio stream to transcribe.

Returns: Promise<void>

`endAudioInput()`
Direct link to endaudioinput

Signals the end of the current audio turn so the model can finalize its response. Call this when the user stops speaking and the provider is not configured for server-side turn detection.

Returns: Promise<void>

`addInstructions()`
Direct link to addinstructions

Updates the system prompt for the active session.

instructions?:

string

System prompt to apply to the session.

Returns: void

`addTools()`
Direct link to addtools

Registers tools with the voice instance. When NovaSonicVoice is attached to an Agent, the Agent's tools are added automatically.

tools?:

ToolsInput

Tools exposed to the model.

Returns: void

`getSpeakers()`
Direct link to getspeakers

Returns the list of voices supported by Nova 2 Sonic.

Returns: Promise<Array<{ voiceId: string; name: string; language: string; locale: string; gender: 'masculine' | 'feminine'; polyglot: boolean }>>

`getListener()`
Direct link to getlistener

Returns whether the voice instance currently holds an open stream.

Returns: Promise<{ enabled: boolean }>

`close()`
Direct link to close

Closes the bidirectional stream and destroys the underlying Bedrock client. Call this when the conversation ends.

Returns: void

`on()` / `off()`
Direct link to on--off

Registers and removes event listeners. See Voice events for the shared event API.

Events
Direct link to Events

NovaSonicVoice emits the following events:

speaking:

event

Assistant audio chunk. Callback receives { audioData: Int16Array, sampleRate?: number }.

writing:

event

Transcribed text from the user or assistant. Callback receives { text: string, role: 'assistant' | 'user', generationStage?: 'SPECULATIVE' | 'FINAL' }.

toolCall:

event

Model requested a tool call. Callback receives { name: string, args: Record<string, any>, id: string }.

interrupt:

event

User or model interrupted the current turn. Callback receives { type: 'user' | 'model', timestamp: number }.

turnComplete:

event

Model finished its turn. Callback receives { timestamp: number }.

session:

event

Session state transition. Callback receives { state: 'connecting' | 'connected' | 'disconnected' | 'disconnecting' | 'error' }.

usage:

event

Token usage for the turn. Callback receives { inputTokens: number, outputTokens: number, totalTokens: number }.

error:

event

Stream or provider error. Callback receives { message: string, code?: string, details?: unknown }.

generationStage distinguishes provisional transcripts ('SPECULATIVE') from finalized ones ('FINAL'). Use 'FINAL' text for persistent storage and 'SPECULATIVE' text for live captions.

Available voices
Direct link to Available voices

Nova 2 Sonic ships voices in ten locales. Tiffany and Matthew are polyglot and can speak any supported language.

Voice ID	Name	Language	Locale	Gender	Polyglot
`tiffany`	Tiffany	English	en-US	feminine	yes
`matthew`	Matthew	English	en-US	masculine	yes
`amy`	Amy	English	en-GB	feminine	no
`olivia`	Olivia	English	en-AU	feminine	no
`kiara`	Kiara	English	en-IN	feminine	no
`arjun`	Arjun	English	en-IN	masculine	no
`ambre`	Ambre	French	fr-FR	feminine	no
`florian`	Florian	French	fr-FR	masculine	no
`beatrice`	Beatrice	Italian	it-IT	feminine	no
`lorenzo`	Lorenzo	Italian	it-IT	masculine	no
`tina`	Tina	German	de-DE	feminine	no
`lennart`	Lennart	German	de-DE	masculine	no
`lupe`	Lupe	Spanish	es-US	feminine	no
`carlos`	Carlos	Spanish	es-US	masculine	no
`carolina`	Carolina	Portuguese	pt-BR	feminine	no
`leo`	Leo	Portuguese	pt-BR	masculine	no
`kiara`	Kiara	Hindi	hi-IN	feminine	no
`arjun`	Arjun	Hindi	hi-IN	masculine	no

Notes
Direct link to Notes

Audio is streamed as 16-bit PCM. Assistant audio is emitted as Int16Array on the speaking event.
The voice instance must call connect() before any other streaming method.
close() destroys the underlying BedrockRuntimeClient to release the HTTP/2 session.
Nova 2 Sonic is available in us-east-1, us-west-2, and ap-northeast-1. Other regions throw a configuration error during construction.

Usage exampleDirect link to Usage example

AuthenticationDirect link to Authentication

ConfigurationDirect link to Configuration

Constructor optionsDirect link to Constructor options

region?:

model?:

credentials?:

speaker?:

languageCode?:

instructions?:

tools?:

sessionConfig?:

debug?:

Session configurationDirect link to Session configuration

inferenceConfiguration?:

maxTokens?:

temperature?:

topP?:

topK?:

stopSequences?:

turnDetectionConfiguration?:

endpointingSensitivity?:

toolChoice?:

enableKnowledgeGrounding?:

knowledgeBaseConfig?:

MethodsDirect link to Methods

connect()Direct link to connect

options?:

speak()Direct link to speak

input:

options?:

send()Direct link to send

audioData:

listen()Direct link to listen

audioData:

endAudioInput()Direct link to endaudioinput

addInstructions()Direct link to addinstructions

instructions?:

addTools()Direct link to addtools

tools?:

getSpeakers()Direct link to getspeakers

getListener()Direct link to getlistener

close()Direct link to close

on() / off()Direct link to on--off

EventsDirect link to Events

speaking:

writing:

toolCall:

interrupt:

turnComplete:

session:

usage:

error:

Available voicesDirect link to Available voices

NotesDirect link to Notes

Usage example
Direct link to Usage example

Authentication
Direct link to Authentication

Configuration
Direct link to Configuration

Constructor options
Direct link to Constructor options

Session configuration
Direct link to Session configuration

Methods
Direct link to Methods

`connect()`
Direct link to connect

`speak()`
Direct link to speak

`send()`
Direct link to send

`listen()`
Direct link to listen

`endAudioInput()`
Direct link to endaudioinput

`addInstructions()`
Direct link to addinstructions

`addTools()`
Direct link to addtools

`getSpeakers()`
Direct link to getspeakers

`getListener()`
Direct link to getlistener

`close()`
Direct link to close

`on()` / `off()`
Direct link to on--off

Events
Direct link to Events

Available voices
Direct link to Available voices

Notes
Direct link to Notes