# Voice Mastra agents can be enhanced with voice capabilities, allowing them to speak responses and listen to user input. You can configure an agent to use either a single voice provider or combine multiple providers for different operations. ## Basic usage The simplest way to add voice to an agent is to use a single provider for both speaking and listening: ```typescript import { createReadStream } from "fs"; import path from "path"; import { Agent } from "@mastra/core/agent"; import { OpenAIVoice } from "@mastra/voice-openai"; // Initialize the voice provider with default settings const voice = new OpenAIVoice(); // Create an agent with voice capabilities export const agent = new Agent({ id: "voice-agent", name: "Voice Agent", instructions: `You are a helpful assistant with both STT and TTS capabilities.`, model: "openai/gpt-5.1", voice, }); // The agent can now use voice for interaction const audioStream = await agent.voice.speak("Hello, I'm your AI assistant!", { filetype: "m4a", }); playAudio(audioStream!); try { const transcription = await agent.voice.listen(audioStream); console.log(transcription); } catch (error) { console.error("Error transcribing audio:", error); } ``` ## Working with Audio Streams The `speak()` and `listen()` methods work with Node.js streams. Here's how to save and load audio files: ### Saving Speech Output The `speak` method returns a stream that you can pipe to a file or speaker. ```typescript import { createWriteStream } from "fs"; import path from "path"; // Generate speech and save to file const audio = await agent.voice.speak("Hello, World!"); const filePath = path.join(process.cwd(), "agent.mp3"); const writer = createWriteStream(filePath); audio.pipe(writer); await new Promise((resolve, reject) => { writer.on("finish", () => resolve()); writer.on("error", reject); }); ``` ### Transcribing Audio Input The `listen` method expects a stream of audio data from a microphone or file. ```typescript import { createReadStream } from "fs"; import path from "path"; // Read audio file and transcribe const audioFilePath = path.join(process.cwd(), "/agent.m4a"); const audioStream = createReadStream(audioFilePath); try { console.log("Transcribing audio file..."); const transcription = await agent.voice.listen(audioStream, { filetype: "m4a", }); console.log("Transcription:", transcription); } catch (error) { console.error("Error transcribing audio:", error); } ``` ## Speech-to-Speech Voice Interactions For more dynamic and interactive voice experiences, you can use real-time voice providers that support speech-to-speech capabilities: ```typescript import { Agent } from "@mastra/core/agent"; import { getMicrophoneStream } from "@mastra/node-audio"; import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime"; import { search, calculate } from "../tools"; // Initialize the realtime voice provider const voice = new OpenAIRealtimeVoice({ apiKey: process.env.OPENAI_API_KEY, model: "gpt-5.1-realtime", speaker: "alloy", }); // Create an agent with speech-to-speech voice capabilities export const agent = new Agent({ id: "speech-to-speech-agent", name: "Speech-to-Speech Agent", instructions: `You are a helpful assistant with speech-to-speech capabilities.`, model: "openai/gpt-5.1", tools: { // Tools configured on Agent are passed to voice provider search, calculate, }, voice, }); // Establish a WebSocket connection await agent.voice.connect(); // Start a conversation agent.voice.speak("Hello, I'm your AI assistant!"); // Stream audio from a microphone const microphoneStream = getMicrophoneStream(); agent.voice.send(microphoneStream); // When done with the conversation agent.voice.close(); ``` ### Event System The realtime voice provider emits several events you can listen for: ```typescript // Listen for speech audio data sent from voice provider agent.voice.on("speaking", ({ audio }) => { // audio contains ReadableStream or Int16Array audio data }); // Listen for transcribed text sent from both voice provider and user agent.voice.on("writing", ({ text, role }) => { console.log(`${role} said: ${text}`); }); // Listen for errors agent.voice.on("error", (error) => { console.error("Voice error:", error); }); ``` ## Examples ### End-to-end voice interaction This example demonstrates a voice interaction between two agents. The hybrid voice agent, which uses multiple providers, speaks a question, which is saved as an audio file. The unified voice agent listens to that file, processes the question, generates a response, and speaks it back. Both audio outputs are saved to the `audio` directory. The following files are created: - **hybrid-question.mp3** – Hybrid agent's spoken question. - **unified-response.mp3** – Unified agent's spoken response. ```typescript import "dotenv/config"; import path from "path"; import { createReadStream } from "fs"; import { Agent } from "@mastra/core/agent"; import { CompositeVoice } from "@mastra/core/voice"; import { OpenAIVoice } from "@mastra/voice-openai"; import { Mastra } from "@mastra/core"; // Saves an audio stream to a file in the audio directory, creating the directory if it doesn't exist. export const saveAudioToFile = async ( audio: NodeJS.ReadableStream, filename: string, ): Promise => { const audioDir = path.join(process.cwd(), "audio"); const filePath = path.join(audioDir, filename); await fs.promises.mkdir(audioDir, { recursive: true }); const writer = createWriteStream(filePath); audio.pipe(writer); return new Promise((resolve, reject) => { writer.on("finish", resolve); writer.on("error", reject); }); }; // Saves an audio stream to a file in the audio directory, creating the directory if it doesn't exist. export const convertToText = async ( input: string | NodeJS.ReadableStream, ): Promise => { if (typeof input === "string") { return input; } const chunks: Buffer[] = []; return new Promise((resolve, reject) => { inputData.on("data", (chunk) => chunks.push(Buffer.from(chunk))); inputData.on("error", reject); inputData.on("end", () => resolve(Buffer.concat(chunks).toString("utf-8"))); }); }; export const hybridVoiceAgent = new Agent({ id: "hybrid-voice-agent", name: "Hybrid Voice Agent", model: "openai/gpt-5.1", instructions: "You can speak and listen using different providers.", voice: new CompositeVoice({ input: new OpenAIVoice(), output: new OpenAIVoice(), }), }); export const unifiedVoiceAgent = new Agent({ id: "unified-voice-agent", name: "Unified Voice Agent", instructions: "You are an agent with both STT and TTS capabilities.", model: "openai/gpt-5.1", voice: new OpenAIVoice(), }); export const mastra = new Mastra({ agents: { hybridVoiceAgent, unifiedVoiceAgent }, }); const hybridVoiceAgent = mastra.getAgent("hybridVoiceAgent"); const unifiedVoiceAgent = mastra.getAgent("unifiedVoiceAgent"); const question = "What is the meaning of life in one sentence?"; const hybridSpoken = await hybridVoiceAgent.voice.speak(question); await saveAudioToFile(hybridSpoken!, "hybrid-question.mp3"); const audioStream = createReadStream( path.join(process.cwd(), "audio", "hybrid-question.mp3"), ); const unifiedHeard = await unifiedVoiceAgent.voice.listen(audioStream); const inputText = await convertToText(unifiedHeard!); const unifiedResponse = await unifiedVoiceAgent.generate(inputText); const unifiedSpoken = await unifiedVoiceAgent.voice.speak(unifiedResponse.text); await saveAudioToFile(unifiedSpoken!, "unified-response.mp3"); ``` ### Using Multiple Providers For more flexibility, you can use different providers for speaking and listening using the CompositeVoice class: ```typescript import { Agent } from "@mastra/core/agent"; import { CompositeVoice } from "@mastra/core/voice"; import { OpenAIVoice } from "@mastra/voice-openai"; import { PlayAIVoice } from "@mastra/voice-playai"; export const agent = new Agent({ id: "voice-agent", name: "Voice Agent", instructions: `You are a helpful assistant with both STT and TTS capabilities.`, model: "openai/gpt-5.1", // Create a composite voice using OpenAI for listening and PlayAI for speaking voice: new CompositeVoice({ input: new OpenAIVoice(), output: new PlayAIVoice(), }), }); ``` ### Using AI SDK Mastra supports using AI SDK's transcription and speech models directly in `CompositeVoice`, giving you access to a wide range of providers through the AI SDK ecosystem: ```typescript import { Agent } from "@mastra/core/agent"; import { CompositeVoice } from "@mastra/core/voice"; import { openai } from "@ai-sdk/openai"; import { elevenlabs } from "@ai-sdk/elevenlabs"; import { groq } from "@ai-sdk/groq"; export const agent = new Agent({ id: "aisdk-voice-agent", name: "AI SDK Voice Agent", instructions: `You are a helpful assistant with voice capabilities.`, model: "openai/gpt-5.1", // Pass AI SDK models directly to CompositeVoice voice: new CompositeVoice({ input: openai.transcription('whisper-1'), // AI SDK transcription model output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech model }), }); // Use voice capabilities as usual const audioStream = await agent.voice.speak("Hello!"); const transcribedText = await agent.voice.listen(audioStream); ``` #### Mix and Match Providers You can mix AI SDK models with Mastra voice providers: ```typescript import { CompositeVoice } from "@mastra/core/voice"; import { PlayAIVoice } from "@mastra/voice-playai"; import { openai } from "@ai-sdk/openai"; // Use AI SDK for transcription and Mastra provider for speech const voice = new CompositeVoice({ input: openai.transcription('whisper-1'), // AI SDK output: new PlayAIVoice(), // Mastra provider }); ``` For the complete list of supported AI SDK providers and their capabilities: - [Transcription](https://ai-sdk.dev/docs/providers/openai/transcription) - [Speech](https://ai-sdk.dev/docs/providers/elevenlabs/speech) ## Supported Voice Providers Mastra supports multiple voice providers for text-to-speech (TTS) and speech-to-text (STT) capabilities: | Provider | Package | Features | Reference | | --------------- | ------------------------------- | ------------------------- | --------------------------------------------------------------------------- | | OpenAI | `@mastra/voice-openai` | TTS, STT | [Documentation](https://mastra.ai/reference/voice/openai/llms.txt) | | OpenAI Realtime | `@mastra/voice-openai-realtime` | Realtime speech-to-speech | [Documentation](https://mastra.ai/reference/voice/openai-realtime/llms.txt) | | ElevenLabs | `@mastra/voice-elevenlabs` | High-quality TTS | [Documentation](https://mastra.ai/reference/voice/elevenlabs/llms.txt) | | PlayAI | `@mastra/voice-playai` | TTS | [Documentation](https://mastra.ai/reference/voice/playai/llms.txt) | | Google | `@mastra/voice-google` | TTS, STT | [Documentation](https://mastra.ai/reference/voice/google/llms.txt) | | Deepgram | `@mastra/voice-deepgram` | STT | [Documentation](https://mastra.ai/reference/voice/deepgram/llms.txt) | | Murf | `@mastra/voice-murf` | TTS | [Documentation](https://mastra.ai/reference/voice/murf/llms.txt) | | Speechify | `@mastra/voice-speechify` | TTS | [Documentation](https://mastra.ai/reference/voice/speechify/llms.txt) | | Sarvam | `@mastra/voice-sarvam` | TTS, STT | [Documentation](https://mastra.ai/reference/voice/sarvam/llms.txt) | | Azure | `@mastra/voice-azure` | TTS, STT | [Documentation](https://mastra.ai/reference/voice/mastra-voice/llms.txt) | | Cloudflare | `@mastra/voice-cloudflare` | TTS | [Documentation](https://mastra.ai/reference/voice/mastra-voice/llms.txt) | ## Next Steps - [Voice API Reference](https://mastra.ai/reference/voice/mastra-voice/llms.txt) - Detailed API documentation for voice capabilities - [Text to Speech Examples](https://github.com/mastra-ai/voice-examples/tree/main/text-to-speech) - Interactive story generator and other TTS implementations - [Speech to Text Examples](https://github.com/mastra-ai/voice-examples/tree/main/speech-to-text) - Voice memo app and other STT implementations - [Speech to Speech Examples](https://github.com/mastra-ai/voice-examples/tree/main/speech-to-speech) - Real-time voice conversation with call analysis