AWS Nova Sonic voice
The NovaSonicVoice class provides real-time speech-to-speech capabilities backed by AWS Bedrock Nova 2 Sonic. It opens a bidirectional stream to the model and emits events for assistant audio, transcribed text, tool calls, turn boundaries, and interruptions.
Usage exampleDirect link to Usage example
import { NovaSonicVoice } from '@mastra/voice-aws-nova-sonic'
import { playAudio, getMicrophoneStream } from '@mastra/node-audio'
// Initialize using the default AWS credential provider chain
const voice = new NovaSonicVoice({
region: 'us-east-1',
speaker: 'matthew',
})
// Or pass explicit credentials
const voiceWithCredentials = new NovaSonicVoice({
region: 'us-east-1',
speaker: 'tiffany',
credentials: {
accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
},
})
// Establish the bidirectional stream
await voice.connect()
// Listen for assistant audio (Int16Array PCM)
voice.on('speaking', ({ audioData }) => {
if (audioData) playAudio(audioData)
})
// Listen for transcribed text from the user and assistant
voice.on('writing', ({ text, role, generationStage }) => {
console.log(`${role} (${generationStage ?? 'FINAL'}): ${text}`)
})
// Stream microphone audio in real time
const microphoneStream = getMicrophoneStream()
await voice.send(microphoneStream)
// Disconnect when done
voice.close()
AuthenticationDirect link to Authentication
NovaSonicVoice uses the AWS SDK credential resolution chain when no credentials option is passed. Mastra calls defaultProvider() from @aws-sdk/credential-provider-node, which checks (in order) environment variables, shared credentials files, IAM role for EC2, ECS, EKS, and other standard sources.
To use static credentials, pass them on the constructor:
new NovaSonicVoice({
region: 'us-east-1',
credentials: {
accessKeyId: process.env.AWS_ACCESS_KEY_ID!,
secretAccessKey: process.env.AWS_SECRET_ACCESS_KEY!,
sessionToken: process.env.AWS_SESSION_TOKEN,
},
})
The voice provider never logs credential values.
ConfigurationDirect link to Configuration
Constructor optionsDirect link to Constructor options
region?:
model?:
credentials?:
speaker?:
languageCode?:
instructions?:
tools?:
sessionConfig?:
debug?:
Session configurationDirect link to Session configuration
sessionConfig controls inference parameters and turn-taking behavior. All fields are optional.
inferenceConfiguration?:
maxTokens?:
temperature?:
topP?:
topK?:
stopSequences?:
turnDetectionConfiguration?:
endpointingSensitivity?:
toolChoice?:
enableKnowledgeGrounding?:
knowledgeBaseConfig?:
MethodsDirect link to Methods
connect()Direct link to connect
Opens the bidirectional stream to AWS Bedrock and sends the initial session, prompt, and system events. Call this before speak, listen, or send.
options?:
Returns: Promise<void>
speak()Direct link to speak
Synthesizes speech for a text prompt and emits speaking events as audio is produced.
input:
options?:
Returns: Promise<void>
send()Direct link to send
Streams microphone audio (or any PCM source) to the model. Use this for live, continuous conversation.
audioData:
Returns: Promise<void>
listen()Direct link to listen
Convenience wrapper that delegates to send(). Use it when you want a single transcription pass over a finite audio stream.
audioData:
Returns: Promise<void>
endAudioInput()Direct link to endaudioinput
Signals the end of the current audio turn so the model can finalize its response. Call this when the user stops speaking and the provider is not configured for server-side turn detection.
Returns: Promise<void>
addInstructions()Direct link to addinstructions
Updates the system prompt for the active session.
instructions?:
Returns: void
addTools()Direct link to addtools
Registers tools with the voice instance. When NovaSonicVoice is attached to an Agent, the Agent's tools are added automatically.
tools?:
Returns: void
getSpeakers()Direct link to getspeakers
Returns the list of voices supported by Nova 2 Sonic.
Returns: Promise<Array<{ voiceId: string; name: string; language: string; locale: string; gender: 'masculine' | 'feminine'; polyglot: boolean }>>
getListener()Direct link to getlistener
Returns whether the voice instance currently holds an open stream.
Returns: Promise<{ enabled: boolean }>
close()Direct link to close
Closes the bidirectional stream and destroys the underlying Bedrock client. Call this when the conversation ends.
Returns: void
on() / off()Direct link to on--off
Registers and removes event listeners. See Voice events for the shared event API.
EventsDirect link to Events
NovaSonicVoice emits the following events:
speaking:
writing:
toolCall:
interrupt:
turnComplete:
session:
usage:
error:
generationStage distinguishes provisional transcripts ('SPECULATIVE') from finalized ones ('FINAL'). Use 'FINAL' text for persistent storage and 'SPECULATIVE' text for live captions.
Available voicesDirect link to Available voices
Nova 2 Sonic ships voices in ten locales. Tiffany and Matthew are polyglot and can speak any supported language.
| Voice ID | Name | Language | Locale | Gender | Polyglot |
|---|---|---|---|---|---|
tiffany | Tiffany | English | en-US | feminine | yes |
matthew | Matthew | English | en-US | masculine | yes |
amy | Amy | English | en-GB | feminine | no |
olivia | Olivia | English | en-AU | feminine | no |
kiara | Kiara | English | en-IN | feminine | no |
arjun | Arjun | English | en-IN | masculine | no |
ambre | Ambre | French | fr-FR | feminine | no |
florian | Florian | French | fr-FR | masculine | no |
beatrice | Beatrice | Italian | it-IT | feminine | no |
lorenzo | Lorenzo | Italian | it-IT | masculine | no |
tina | Tina | German | de-DE | feminine | no |
lennart | Lennart | German | de-DE | masculine | no |
lupe | Lupe | Spanish | es-US | feminine | no |
carlos | Carlos | Spanish | es-US | masculine | no |
carolina | Carolina | Portuguese | pt-BR | feminine | no |
leo | Leo | Portuguese | pt-BR | masculine | no |
kiara | Kiara | Hindi | hi-IN | feminine | no |
arjun | Arjun | Hindi | hi-IN | masculine | no |
NotesDirect link to Notes
- Audio is streamed as 16-bit PCM. Assistant audio is emitted as
Int16Arrayon thespeakingevent. - The voice instance must call
connect()before any other streaming method. close()destroys the underlyingBedrockRuntimeClientto release the HTTP/2 session.- Nova 2 Sonic is available in
us-east-1,us-west-2, andap-northeast-1. Other regions throw a configuration error during construction.