Inworld
The Inworld voice implementation in Mastra provides streaming text-to-speech (TTS) and batch speech-to-text (STT) capabilities using Inworld AI's API. It supports multiple TTS and STT models, configurable audio encodings, and progressive audio streaming.
For real-time, full-duplex speech-to-speech, the same package exports InworldRealtimeVoice.
Usage exampleDirect link to Usage example
import { InworldVoice } from '@mastra/voice-inworld'
// Initialize with default configuration (uses INWORLD_API_KEY environment variable)
const voice = new InworldVoice()
// Initialize with custom configuration
const voice = new InworldVoice({
speechModel: {
name: 'inworld-tts-2',
apiKey: 'your-api-key',
},
listeningModel: {
name: 'groq/whisper-large-v3',
apiKey: 'your-api-key',
},
speaker: 'Dennis',
})
// Text-to-Speech (streaming)
const audioStream = await voice.speak('Hello, world!')
// Speech-to-Text
const transcript = await voice.listen(audioStream)
Constructor parametersDirect link to Constructor parameters
speechModel?:
name?:
apiKey?:
listeningModel?:
name?:
apiKey?:
speaker?:
audioEncoding?:
sampleRateHertz?:
language?:
MethodsDirect link to Methods
speak(input, options?)Direct link to speakinput-options
Converts text to speech using Inworld's streaming TTS endpoint. Returns a readable stream that emits audio chunks progressively as they arrive.
const audioStream = await voice.speak('Hello, world!', {
speaker: 'Olivia',
audioEncoding: 'WAV',
sampleRateHertz: 24000,
speakingRate: 1.2,
temperature: 0.8,
})
input:
options?:
speaker?:
audioEncoding?:
sampleRateHertz?:
speakingRate?:
temperature?:
deliveryMode?:
language?:
Returns: Promise<NodeJS.ReadableStream>
listen(input, options?)Direct link to listeninput-options
Converts speech to text using Inworld's batch STT endpoint.
const transcript = await voice.listen(audioStream, {
audioEncoding: 'MP3',
sampleRateHertz: 44100,
language: 'ja-JP',
})
input:
options?:
audioEncoding?:
sampleRateHertz?:
language?:
numberOfChannels?:
Returns: Promise<string>
getSpeakers()Direct link to getspeakers
Returns a list of available voices from the Inworld API.
const speakers = await voice.getSpeakers()
// [{ voiceId: 'Dennis', name: 'Dennis', language: 'en', description: '...', tags: ['friendly'], source: 'SYSTEM' }, ...]
Returns: Promise<Array<{ voiceId: string; name: string; language: string; description: string; tags: string[]; source: string }>>
NotesDirect link to Notes
- The TTS endpoint uses progressive NDJSON streaming, so audio playback can begin before the full response is received.
- An API key can be provided via the
speechModelorlisteningModelconfig, or theINWORLD_API_KEYenvironment variable. TTS and STT keys are resolved independently: passing distinctspeechModel.apiKeyandlisteningModel.apiKeyvalues lets each service use its own credential. If only one is provided, it is reused for both services as a fallback before the env var. inworld-tts-2is the default flagship model. UsedeliveryMode(STABLE|BALANCED|CREATIVE) to steer delivery style on this model. Thetemperatureoption is ignored oninworld-tts-2.- The
inworld-tts-1.5-minimodel offers lower latency at the cost of reduced voice quality compared toinworld-tts-1.5-max.