Inworld

The Inworld voice implementation in Mastra provides streaming text-to-speech (TTS) and batch speech-to-text (STT) capabilities using Inworld AI's API. It supports multiple TTS and STT models, configurable audio encodings, and progressive audio streaming.

For real-time, full-duplex speech-to-speech, the same package exports InworldRealtimeVoice.

Usage example
Direct link to Usage example

import { InworldVoice } from '@mastra/voice-inworld'

// Initialize with default configuration (uses INWORLD_API_KEY environment variable)
const voice = new InworldVoice()

// Initialize with custom configuration
const voice = new InworldVoice({
  speechModel: {
    name: 'inworld-tts-2',
    apiKey: 'your-api-key',
  },
  listeningModel: {
    name: 'groq/whisper-large-v3',
    apiKey: 'your-api-key',
  },
  speaker: 'Dennis',
})

// Text-to-Speech (streaming)
const audioStream = await voice.speak('Hello, world!')

// Speech-to-Text
const transcript = await voice.listen(audioStream)

Constructor parameters
Direct link to Constructor parameters

speechModel?:

InworldVoiceConfig

= { name: 'inworld-tts-2' }

Configuration for text-to-speech functionality.

InworldVoiceConfig

name?:

'inworld-tts-2' | 'inworld-tts-1.5-max' | 'inworld-tts-1.5-mini'

The Inworld TTS model to use.

apiKey?:

string

Inworld API key. Falls back to INWORLD_API_KEY environment variable.

listeningModel?:

InworldListeningConfig

= { name: 'groq/whisper-large-v3' }

Configuration for speech-to-text functionality.

InworldListeningConfig

name?:

'groq/whisper-large-v3'

The Inworld STT model to use.

apiKey?:

string

Inworld API key. Falls back to INWORLD_API_KEY environment variable.

speaker?:

string

= 'Dennis'

Default voice ID to use for text-to-speech.

audioEncoding?:

= 'MP3'

Default audio encoding for TTS output.

sampleRateHertz?:

number

= 48000

Default sample rate for TTS output.

language?:

string

= 'en-US'

Default BCP-47 language code for STT.

Methods
Direct link to Methods

`speak(input, options?)`
Direct link to speakinput-options

Converts text to speech using Inworld's streaming TTS endpoint. Returns a readable stream that emits audio chunks progressively as they arrive.

const audioStream = await voice.speak('Hello, world!', {
  speaker: 'Olivia',
  audioEncoding: 'WAV',
  sampleRateHertz: 24000,
  speakingRate: 1.2,
  temperature: 0.8,
})

input:

string | NodeJS.ReadableStream

Text to convert to speech. If a stream is provided, it will be converted to text first.

options?:

InworldSpeakOptions

Additional options for speech synthesis.

InworldSpeakOptions

speaker?:

string

Override the default speaker for this request.

audioEncoding?:

AudioEncoding

Override the default audio encoding.

sampleRateHertz?:

number

Override the default sample rate.

speakingRate?:

number

Adjust the speaking rate.

temperature?:

number

Controls voice variability. Honored on `inworld-tts-1.5-*` models; ignored by `inworld-tts-2`.

deliveryMode?:

'STABLE' | 'BALANCED' | 'CREATIVE'

Steering control for delivery style. Only honored by `inworld-tts-2`.

language?:

string

BCP-47 language code for this request. Auto-detected when omitted.

Returns: Promise<NodeJS.ReadableStream>

`listen(input, options?)`
Direct link to listeninput-options

Converts speech to text using Inworld's batch STT endpoint.

const transcript = await voice.listen(audioStream, {
  audioEncoding: 'MP3',
  sampleRateHertz: 44100,
  language: 'ja-JP',
})

input:

NodeJS.ReadableStream

Audio stream to transcribe.

options?:

InworldListenOptions

Additional options for transcription.

InworldListenOptions

audioEncoding?:

'LINEAR16' | 'MP3' | 'OGG_OPUS' | 'FLAC' | 'AUTO_DETECT'

Audio encoding of the input stream.

sampleRateHertz?:

number

Sample rate of the input audio.

language?:

string

BCP-47 language code for transcription.

numberOfChannels?:

number

Number of audio channels in the input.

Returns: Promise<string>

`getSpeakers()`
Direct link to getspeakers

Returns a list of available voices from the Inworld API.

const speakers = await voice.getSpeakers()
// [{ voiceId: 'Dennis', name: 'Dennis', language: 'en', description: '...', tags: ['friendly'], source: 'SYSTEM' }, ...]

Returns: Promise<Array<{ voiceId: string; name: string; language: string; description: string; tags: string[]; source: string }>>

Notes
Direct link to Notes

The TTS endpoint uses progressive NDJSON streaming, so audio playback can begin before the full response is received.
An API key can be provided via the speechModel or listeningModel config, or the INWORLD_API_KEY environment variable. TTS and STT keys are resolved independently: passing distinct speechModel.apiKey and listeningModel.apiKey values lets each service use its own credential. If only one is provided, it is reused for both services as a fallback before the env var.
inworld-tts-2 is the default flagship model. Use deliveryMode (STABLE | BALANCED | CREATIVE) to steer delivery style on this model. The temperature option is ignored on inworld-tts-2.
The inworld-tts-1.5-mini model offers lower latency at the cost of reduced voice quality compared to inworld-tts-1.5-max.

Usage exampleDirect link to Usage example

Constructor parametersDirect link to Constructor parameters

speechModel?:

name?:

apiKey?:

listeningModel?:

name?:

apiKey?:

speaker?:

audioEncoding?:

sampleRateHertz?:

language?:

MethodsDirect link to Methods

speak(input, options?)Direct link to speakinput-options

input:

options?:

speaker?:

audioEncoding?:

sampleRateHertz?:

speakingRate?:

temperature?:

deliveryMode?:

language?:

listen(input, options?)Direct link to listeninput-options

input:

options?:

audioEncoding?:

sampleRateHertz?:

language?:

numberOfChannels?:

getSpeakers()Direct link to getspeakers

NotesDirect link to Notes

Usage example
Direct link to Usage example

Constructor parameters
Direct link to Constructor parameters

Methods
Direct link to Methods

`speak(input, options?)`
Direct link to speakinput-options

`listen(input, options?)`
Direct link to listeninput-options

`getSpeakers()`
Direct link to getspeakers

Notes
Direct link to Notes