voice.speak()

The speak() method is a core function available in all Mastra voice providers that converts text to speech. It takes text input and returns an audio stream that can be played or saved.

Parameters
Direct link to Parameters

input:

string | NodeJS.ReadableStream

Text to convert to speech. Can be a string or a readable stream of text.

options?:

object

Options for speech synthesis

object

speaker?:

string

Voice ID to use for this specific request. Overrides the default speaker set in the constructor.

Return Value
Direct link to Return Value

Returns a Promise<NodeJS.ReadableStream | void> where:

NodeJS.ReadableStream: A stream of audio data that can be played or saved
void: When using a realtime voice provider that emits audio through events instead of returning it directly

Provider-Specific Options
Direct link to Provider-Specific Options

Each voice provider may support additional options specific to their implementation. Here are some examples:

OpenAI
Direct link to OpenAI

options?:

Options

Configuration options.

Options

speed?:

number

Speech speed multiplier. Values between 0.25 and 4.0 are supported.

ElevenLabs
Direct link to ElevenLabs

options?:

Options

Configuration options.

Options

stability?:

number

Voice stability. Higher values result in more stable, less expressive speech.

similarity_boost?:

number

Voice clarity and similarity to the original voice.

Google
Direct link to Google

options?:

Options

Configuration options.

Options

languageCode?:

string

Language code for the voice (e.g., 'en-US').

audioConfig?:

object

Audio configuration options from Google Cloud Text-to-Speech API.

Murf
Direct link to Murf

options?:

Options

Configuration options.

Options

properties?:

object

properties configuration.

object

rate?:

number

Speech rate multiplier.

pitch?:

number

Voice pitch adjustment.

format?:

'MP3' | 'WAV' | 'FLAC' | 'ALAW' | 'ULAW'

Output audio format.

Usage Example
Direct link to Usage Example

import { OpenAIVoice } from '@mastra/voice-openai'
// Initialize a voice provider
const voice = new OpenAIVoice({
  speaker: 'alloy', // Default voice
})
// Basic usage with default settings
const audioStream = await voice.speak('Hello, world!')
// Using a different voice for this specific request
const audioStreamWithDifferentVoice = await voice.speak('Hello again!', {
  speaker: 'nova',
})
// Using provider-specific options
const audioStreamWithOptions = await voice.speak('Hello with options!', {
  speaker: 'echo',
  speed: 1.2, // OpenAI-specific option
})
// Using a text stream as input
import { Readable } from 'stream'
const textStream = Readable.from(['Hello', ' from', ' a', ' stream!'])
const audioStreamFromTextStream = await voice.speak(textStream)

Using with CompositeVoice
Direct link to Using with CompositeVoice

When using CompositeVoice, the speak() method delegates to the configured speaking provider:

import { CompositeVoice } from '@mastra/core/voice'
import { OpenAIVoice } from '@mastra/voice-openai'
import { PlayAIVoice } from '@mastra/voice-playai'

const voice = new CompositeVoice({
  output: new PlayAIVoice(),
  input: new OpenAIVoice(),
})

// This will use the PlayAIVoice provider
const audioStream = await voice.speak('Hello, world!')

Using AI SDK Model Providers
Direct link to Using AI SDK Model Providers

You can also use AI SDK speech models directly with CompositeVoice:

import { CompositeVoice } from '@mastra/core/voice'
import { openai } from '@ai-sdk/openai'
import { elevenlabs } from '@ai-sdk/elevenlabs'

// Use AI SDK speech models
const voice = new CompositeVoice({
  output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK model
  input: openai.transcription('whisper-1'), // AI SDK model
})

// Works the same way
const audioStream = await voice.speak('Hello from AI SDK!')

// Provider-specific options can be passed through
const audioWithOptions = await voice.speak('Hello with options!', {
  speaker: 'Rachel', // ElevenLabs voice
  providerOptions: {
    elevenlabs: {
      stability: 0.5,
      similarity_boost: 0.75,
    },
  },
})

See the CompositeVoice reference for more details on AI SDK integration.

Realtime Voice Providers
Direct link to Realtime Voice Providers

When using realtime voice providers like OpenAIRealtimeVoice, the speak() method behaves differently:

Instead of returning an audio stream, it emits a 'speaking' event with the audio data
You need to register an event listener to receive the audio chunks

import { OpenAIRealtimeVoice } from '@mastra/voice-openai-realtime'
import Speaker from '@mastra/node-speaker'

const speaker = new Speaker({
  sampleRate: 24100, // Audio sample rate in Hz - standard for high-quality audio on MacBook Pro
  channels: 1, // Mono audio output (as opposed to stereo which would be 2)
  bitDepth: 16, // Bit depth for audio quality - CD quality standard (16-bit resolution)
})

const voice = new OpenAIRealtimeVoice()
await voice.connect()
// Register event listener for audio chunks
voice.on('speaker', stream => {
  // Handle audio chunk (e.g., play it or save it)
  stream.pipe(speaker)
})
// This will emit 'speaking' events instead of returning a stream
await voice.speak('Hello, this is realtime speech!')

Notes
Direct link to Notes

The behavior of speak() may vary slightly between providers, but all implementations follow the same basic interface.
When using a realtime voice provider, the method might not return an audio stream directly but instead emit a 'speaking' event.
If a text stream is provided as input, the provider will typically convert it to a string before processing.
The audio format of the returned stream depends on the provider. Common formats include MP3, WAV, and OGG.
For best performance, consider closing or ending the audio stream when you're done with it.

ParametersDirect link to Parameters

input:

options?:

speaker?:

Return ValueDirect link to Return Value

Provider-Specific OptionsDirect link to Provider-Specific Options

OpenAIDirect link to OpenAI

options?:

speed?:

ElevenLabsDirect link to ElevenLabs

options?:

stability?:

similarity_boost?:

GoogleDirect link to Google

options?:

languageCode?:

audioConfig?:

MurfDirect link to Murf

options?:

properties?:

rate?:

pitch?:

format?:

Usage ExampleDirect link to Usage Example

Using with CompositeVoiceDirect link to Using with CompositeVoice

Using AI SDK Model ProvidersDirect link to Using AI SDK Model Providers

Realtime Voice ProvidersDirect link to Realtime Voice Providers

NotesDirect link to Notes

Parameters
Direct link to Parameters

Return Value
Direct link to Return Value

Provider-Specific Options
Direct link to Provider-Specific Options

OpenAI
Direct link to OpenAI

ElevenLabs
Direct link to ElevenLabs

Google
Direct link to Google

Murf
Direct link to Murf

Usage Example
Direct link to Usage Example

Using with CompositeVoice
Direct link to Using with CompositeVoice

Using AI SDK Model Providers
Direct link to Using AI SDK Model Providers

Realtime Voice Providers
Direct link to Realtime Voice Providers

Notes
Direct link to Notes