Skip to main content

AWS Nova Sonic voice

The NovaSonicVoice class provides real-time speech-to-speech capabilities backed by AWS Bedrock Nova 2 Sonic. It opens a bidirectional stream to the model and emits events for assistant audio, transcribed text, tool calls, turn boundaries, and interruptions.

Usage example
Direct link to Usage example

src/mastra/voice.ts
import { NovaSonicVoice } from '@mastra/voice-aws-nova-sonic'
import { playAudio, getMicrophoneStream } from '@mastra/node-audio'

// Initialize using the default AWS credential provider chain
const voice = new NovaSonicVoice({
region: 'us-east-1',
speaker: 'matthew',
})

// Or pass explicit credentials
const voiceWithCredentials = new NovaSonicVoice({
region: 'us-east-1',
speaker: 'tiffany',
credentials: {
accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
},
})

// Establish the bidirectional stream
await voice.connect()

// Listen for assistant audio (Int16Array PCM)
voice.on('speaking', ({ audioData }) => {
if (audioData) playAudio(audioData)
})

// Listen for transcribed text from the user and assistant
voice.on('writing', ({ text, role, generationStage }) => {
console.log(`${role} (${generationStage ?? 'FINAL'}): ${text}`)
})

// Stream microphone audio in real time
const microphoneStream = getMicrophoneStream()
await voice.send(microphoneStream)

// Disconnect when done
voice.close()

Authentication
Direct link to Authentication

NovaSonicVoice uses the AWS SDK credential resolution chain when no credentials option is passed. Mastra calls defaultProvider() from @aws-sdk/credential-provider-node, which checks (in order) environment variables, shared credentials files, IAM role for EC2, ECS, EKS, and other standard sources.

To use static credentials, pass them on the constructor:

new NovaSonicVoice({
region: 'us-east-1',
credentials: {
accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
sessionToken: process.env.AWS_SESSION_TOKEN,
},
})

The voice provider never logs credential values.

Configuration
Direct link to Configuration

Constructor options
Direct link to Constructor options

region?:

'us-east-1' | 'us-west-2' | 'ap-northeast-1'
= 'us-east-1'
AWS region that hosts the Nova Sonic model.

model?:

string
= 'amazon.nova-2-sonic-v1:0'
Bedrock model ID for the bidirectional stream.

credentials?:

AwsCredentialIdentity
Static AWS credentials. When omitted the default AWS credential provider chain is used.

speaker?:

string | NovaSonicVoiceConfigDetails
= 'matthew'
Default voice for the assistant. Pass a voice ID string such as 'matthew' or an object that includes a language code and gender.

languageCode?:

NovaSonicLanguageCode
Language code used for the session. Polyglot voices support all listed languages.

instructions?:

string
System prompt sent at session start. Equivalent to calling addInstructions() before connect().

tools?:

NovaSonicToolConfig[]
Tools exposed to the model. When the voice instance is attached to an Agent, the Agent's tools are added automatically.

sessionConfig?:

NovaSonicSessionConfig
Inference, turn-detection, and tool-choice configuration. See Session configuration below.

debug?:

boolean
= false
Enable verbose logging for stream events. Sensitive fields are masked.

Session configuration
Direct link to Session configuration

sessionConfig controls inference parameters and turn-taking behavior. All fields are optional.

inferenceConfiguration?:

object
Sampling and decoding parameters.
object

maxTokens?:

number
Maximum tokens generated per turn.

temperature?:

number
Sampling temperature.

topP?:

number
Nucleus sampling probability.

topK?:

number
Top-k sampling.

stopSequences?:

string[]
Sequences that end generation.

turnDetectionConfiguration?:

object
Endpointing sensitivity for turn detection.
object

endpointingSensitivity?:

'HIGH' | 'MEDIUM' | 'LOW'
Pause duration before the model considers a turn complete. HIGH ends turns fastest (about 1.5s pause), MEDIUM is balanced (about 1.75s), LOW waits longest (about 2s).

toolChoice?:

'auto' | 'any' | { tool: { name: string } }
How the model decides whether to call a tool.

enableKnowledgeGrounding?:

boolean
Enable retrieval-augmented grounding against a Bedrock knowledge base.

knowledgeBaseConfig?:

{ knowledgeBaseId?: string; dataSourceId?: string }
Knowledge base used when knowledge grounding is enabled.

Methods
Direct link to Methods

connect()
Direct link to connect

Opens the bidirectional stream to AWS Bedrock and sends the initial session, prompt, and system events. Call this before speak, listen, or send.

options?:

{ requestContext?: RequestContext }
Optional request context propagated to tool calls made during the session.

Returns: Promise<void>

speak()
Direct link to speak

Synthesizes speech for a text prompt and emits speaking events as audio is produced.

input:

string | NodeJS.ReadableStream
Text or text stream to synthesize.

options?:

NovaSonicVoiceOptions
Per-call overrides such as the speaker or language code.

Returns: Promise<void>

send()
Direct link to send

Streams microphone audio (or any PCM source) to the model. Use this for live, continuous conversation.

audioData:

NodeJS.ReadableStream | Int16Array
16-bit PCM audio to forward to the model.

Returns: Promise<void>

listen()
Direct link to listen

Convenience wrapper that delegates to send(). Use it when you want a single transcription pass over a finite audio stream.

audioData:

NodeJS.ReadableStream
Audio stream to transcribe.

Returns: Promise<void>

endAudioInput()
Direct link to endaudioinput

Signals the end of the current audio turn so the model can finalize its response. Call this when the user stops speaking and the provider is not configured for server-side turn detection.

Returns: Promise<void>

addInstructions()
Direct link to addinstructions

Updates the system prompt for the active session.

instructions?:

string
System prompt to apply to the session.

Returns: void

addTools()
Direct link to addtools

Registers tools with the voice instance. When NovaSonicVoice is attached to an Agent, the Agent's tools are added automatically.

tools?:

ToolsInput
Tools exposed to the model.

Returns: void

getSpeakers()
Direct link to getspeakers

Returns the list of voices supported by Nova 2 Sonic.

Returns: Promise<Array<{ voiceId: string; name: string; language: string; locale: string; gender: 'masculine' | 'feminine'; polyglot: boolean }>>

getListener()
Direct link to getlistener

Returns whether the voice instance currently holds an open stream.

Returns: Promise<{ enabled: boolean }>

close()
Direct link to close

Closes the bidirectional stream and destroys the underlying Bedrock client. Call this when the conversation ends.

Returns: void

on() / off()
Direct link to on--off

Registers and removes event listeners. See Voice events for the shared event API.

Events
Direct link to Events

NovaSonicVoice emits the following events:

speaking:

event
Assistant audio chunk. Callback receives { audioData: Int16Array, sampleRate?: number }.

writing:

event
Transcribed text from the user or assistant. Callback receives { text: string, role: 'assistant' | 'user', generationStage?: 'SPECULATIVE' | 'FINAL' }.

toolCall:

event
Model requested a tool call. Callback receives { name: string, args: Record<string, any>, id: string }.

interrupt:

event
User or model interrupted the current turn. Callback receives { type: 'user' | 'model', timestamp: number }.

turnComplete:

event
Model finished its turn. Callback receives { timestamp: number }.

session:

event
Session state transition. Callback receives { state: 'connecting' | 'connected' | 'disconnected' | 'disconnecting' | 'error' }.

usage:

event
Token usage for the turn. Callback receives { inputTokens: number, outputTokens: number, totalTokens: number }.

error:

event
Stream or provider error. Callback receives { message: string, code?: string, details?: unknown }.

generationStage distinguishes provisional transcripts ('SPECULATIVE') from finalized ones ('FINAL'). Use 'FINAL' text for persistent storage and 'SPECULATIVE' text for live captions.

Available voices
Direct link to Available voices

Nova 2 Sonic ships voices in ten locales. Tiffany and Matthew are polyglot and can speak any supported language.

Voice IDNameLanguageLocaleGenderPolyglot
tiffanyTiffanyEnglishen-USfeminineyes
matthewMatthewEnglishen-USmasculineyes
amyAmyEnglishen-GBfeminineno
oliviaOliviaEnglishen-AUfeminineno
kiaraKiaraEnglishen-INfeminineno
arjunArjunEnglishen-INmasculineno
ambreAmbreFrenchfr-FRfeminineno
florianFlorianFrenchfr-FRmasculineno
beatriceBeatriceItalianit-ITfeminineno
lorenzoLorenzoItalianit-ITmasculineno
tinaTinaGermande-DEfeminineno
lennartLennartGermande-DEmasculineno
lupeLupeSpanishes-USfeminineno
carlosCarlosSpanishes-USmasculineno
carolinaCarolinaPortuguesept-BRfeminineno
leoLeoPortuguesept-BRmasculineno
kiaraKiaraHindihi-INfeminineno
arjunArjunHindihi-INmasculineno

Notes
Direct link to Notes

  • Audio is streamed as 16-bit PCM. Assistant audio is emitted as Int16Array on the speaking event.
  • The voice instance must call connect() before any other streaming method.
  • close() destroys the underlying BedrockRuntimeClient to release the HTTP/2 session.
  • Nova 2 Sonic is available in us-east-1, us-west-2, and ap-northeast-1. Other regions throw a configuration error during construction.