# voice.speak() The `speak()` method is a core function available in all Mastra voice providers that converts text to speech. It takes text input and returns an audio stream that can be played or saved. ## Parameters **input** (`string | NodeJS.ReadableStream`): Text to convert to speech. Can be a string or a readable stream of text. **options** (`object`): Options for speech synthesis **options.speaker** (`string`): Voice ID to use for this specific request. Overrides the default speaker set in the constructor. ## Return value Returns a `Promise` where: - `NodeJS.ReadableStream`: A stream of audio data that can be played or saved - `void`: When using a realtime voice provider that emits audio through events instead of returning it directly ## Provider-specific options Each voice provider may support additional options specific to their implementation. Here are some examples: ### OpenAI **options** (`Options`): Configuration options. **options.speed** (`number`): Speech speed multiplier. Values between 0.25 and 4.0 are supported. ### ElevenLabs **options** (`Options`): Configuration options. **options.stability** (`number`): Voice stability. Higher values result in more stable, less expressive speech. **options.similarity\_boost** (`number`): Voice clarity and similarity to the original voice. ### Google **options** (`Options`): Configuration options. **options.languageCode** (`string`): Language code for the voice (e.g., 'en-US'). **options.audioConfig** (`object`): Audio configuration options from Google Cloud Text-to-Speech API. ### Murf **options** (`Options`): Configuration options. **options.properties** (`object`): properties configuration. **options.properties.rate** (`number`): Speech rate multiplier. **options.properties.pitch** (`number`): Voice pitch adjustment. **options.properties.format** (`'MP3' | 'WAV' | 'FLAC' | 'ALAW' | 'ULAW'`): Output audio format. ## Usage example ```typescript import { OpenAIVoice } from '@mastra/voice-openai' // Initialize a voice provider const voice = new OpenAIVoice({ speaker: 'alloy', // Default voice }) // Basic usage with default settings const audioStream = await voice.speak('Hello, world!') // Using a different voice for this specific request const audioStreamWithDifferentVoice = await voice.speak('Hello again!', { speaker: 'nova', }) // Using provider-specific options const audioStreamWithOptions = await voice.speak('Hello with options!', { speaker: 'echo', speed: 1.2, // OpenAI-specific option }) // Using a text stream as input import { Readable } from 'stream' const textStream = Readable.from(['Hello', ' from', ' a', ' stream!']) const audioStreamFromTextStream = await voice.speak(textStream) ``` ## Using with `CompositeVoice` When using `CompositeVoice`, the `speak()` method delegates to the configured speaking provider: ```typescript import { CompositeVoice } from '@mastra/core/voice' import { OpenAIVoice } from '@mastra/voice-openai' import { PlayAIVoice } from '@mastra/voice-playai' const voice = new CompositeVoice({ output: new PlayAIVoice(), input: new OpenAIVoice(), }) // This will use the PlayAIVoice provider const audioStream = await voice.speak('Hello, world!') ``` ### Using AI SDK Model Providers You can also use AI SDK speech models directly with `CompositeVoice`: ```typescript import { CompositeVoice } from '@mastra/core/voice' import { openai } from '@ai-sdk/openai' import { elevenlabs } from '@ai-sdk/elevenlabs' // Use AI SDK speech models const voice = new CompositeVoice({ output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK model input: openai.transcription('whisper-1'), // AI SDK model }) // Works the same way const audioStream = await voice.speak('Hello from AI SDK!') // Provider-specific options can be passed through const audioWithOptions = await voice.speak('Hello with options!', { speaker: 'Rachel', // ElevenLabs voice providerOptions: { elevenlabs: { stability: 0.5, similarity_boost: 0.75, }, }, }) ``` See the [CompositeVoice reference](https://mastra.ai/reference/voice/composite-voice) for more details on AI SDK integration. ## Realtime voice providers When using realtime voice providers like `OpenAIRealtimeVoice`, the `speak()` method behaves differently: - Instead of returning an audio stream, it emits a 'speaking' event with the audio data - You need to register an event listener to receive the audio chunks ```typescript import { OpenAIRealtimeVoice } from '@mastra/voice-openai-realtime' import Speaker from '@mastra/node-speaker' const speaker = new Speaker({ sampleRate: 24100, // Audio sample rate in Hz - standard for high-quality audio on MacBook Pro channels: 1, // Mono audio output (as opposed to stereo which would be 2) bitDepth: 16, // Bit depth for audio quality - CD quality standard (16-bit resolution) }) const voice = new OpenAIRealtimeVoice() await voice.connect() // Register event listener for audio chunks voice.on('speaker', stream => { // Handle audio chunk (e.g., play it or save it) stream.pipe(speaker) }) // This will emit 'speaking' events instead of returning a stream await voice.speak('Hello, this is realtime speech!') ``` ## Notes - The behavior of `speak()` may vary slightly between providers, but all implementations follow the same basic interface. - When using a realtime voice provider, the method might not return an audio stream directly but instead emit a 'speaking' event. - If a text stream is provided as input, the provider will typically convert it to a string before processing. - The audio format of the returned stream depends on the provider. Common formats include MP3, WAV, and OGG. - For best performance, consider closing or ending the audio stream when you're done with it.