voice.speak()
The speak() method is a core function available in all Mastra voice providers that converts text to speech. It takes text input and returns an audio stream that can be played or saved.
Parameters
input:
string | NodeJS.ReadableStream
Text to convert to speech. Can be a string or a readable stream of text.
options?:
object
Options for speech synthesis
options.speaker?:
string
Voice ID to use for this specific request. Overrides the default speaker set in the constructor.
Return Value
Returns a Promise<NodeJS.ReadableStream | void> where:
NodeJS.ReadableStream: A stream of audio data that can be played or savedvoid: When using a realtime voice provider that emits audio through events instead of returning it directly
Provider-Specific Options
Each voice provider may support additional options specific to their implementation. Here are some examples:
OpenAI
options.speed?:
number
= 1.0
Speech speed multiplier. Values between 0.25 and 4.0 are supported.
ElevenLabs
options.stability?:
number
= 0.5
Voice stability. Higher values result in more stable, less expressive speech.
options.similarity_boost?:
number
= 0.75
Voice clarity and similarity to the original voice.
Google
options.languageCode?:
string
Language code for the voice (e.g., 'en-US').
options.audioConfig?:
object
= { audioEncoding: 'LINEAR16' }
Audio configuration options from Google Cloud Text-to-Speech API.
Murf
options.properties.rate?:
number
Speech rate multiplier.
options.properties.pitch?:
number
Voice pitch adjustment.
options.properties.format?:
'MP3' | 'WAV' | 'FLAC' | 'ALAW' | 'ULAW'
Output audio format.
Usage Example
import { OpenAIVoice } from "@mastra/voice-openai";
// Initialize a voice provider
const voice = new OpenAIVoice({
speaker: "alloy", // Default voice
});
// Basic usage with default settings
const audioStream = await voice.speak("Hello, world!");
// Using a different voice for this specific request
const audioStreamWithDifferentVoice = await voice.speak("Hello again!", {
speaker: "nova",
});
// Using provider-specific options
const audioStreamWithOptions = await voice.speak("Hello with options!", {
speaker: "echo",
speed: 1.2, // OpenAI-specific option
});
// Using a text stream as input
import { Readable } from "stream";
const textStream = Readable.from(["Hello", " from", " a", " stream!"]);
const audioStreamFromTextStream = await voice.speak(textStream);
Using with CompositeVoice
When using CompositeVoice, the speak() method delegates to the configured speaking provider:
import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";
const voice = new CompositeVoice({
output: new PlayAIVoice(),
input: new OpenAIVoice(),
});
// This will use the PlayAIVoice provider
const audioStream = await voice.speak("Hello, world!");
Using AI SDK Model Providers
You can also use AI SDK speech models directly with CompositeVoice:
import { CompositeVoice } from "@mastra/core/voice";
import { openai } from "@ai-sdk/openai";
import { elevenlabs } from "@ai-sdk/elevenlabs";
// Use AI SDK speech models
const voice = new CompositeVoice({
output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK model
input: openai.transcription('whisper-1'), // AI SDK model
});
// Works the same way
const audioStream = await voice.speak("Hello from AI SDK!");
// Provider-specific options can be passed through
const audioWithOptions = await voice.speak("Hello with options!", {
speaker: 'Rachel', // ElevenLabs voice
providerOptions: {
elevenlabs: {
stability: 0.5,
similarity_boost: 0.75,
}
}
});
See the CompositeVoice reference for more details on AI SDK integration.
Realtime Voice Providers
When using realtime voice providers like OpenAIRealtimeVoice, the speak() method behaves differently:
- Instead of returning an audio stream, it emits a 'speaking' event with the audio data
- You need to register an event listener to receive the audio chunks
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import Speaker from "@mastra/node-speaker";
const speaker = new Speaker({
sampleRate: 24100, // Audio sample rate in Hz - standard for high-quality audio on MacBook Pro
channels: 1, // Mono audio output (as opposed to stereo which would be 2)
bitDepth: 16, // Bit depth for audio quality - CD quality standard (16-bit resolution)
});
const voice = new OpenAIRealtimeVoice();
await voice.connect();
// Register event listener for audio chunks
voice.on("speaker", (stream) => {
// Handle audio chunk (e.g., play it or save it)
stream.pipe(speaker);
});
// This will emit 'speaking' events instead of returning a stream
await voice.speak("Hello, this is realtime speech!");
Notes
- The behavior of
speak()may vary slightly between providers, but all implementations follow the same basic interface. - When using a realtime voice provider, the method might not return an audio stream directly but instead emit a 'speaking' event.
- If a text stream is provided as input, the provider will typically convert it to a string before processing.
- The audio format of the returned stream depends on the provider. Common formats include MP3, WAV, and OGG.
- For best performance, consider closing or ending the audio stream when you're done with it.