Skip to main content
Mastra 1.0 is available 🎉 Read announcement

Voice

Mastra agents can be enhanced with voice capabilities, allowing them to speak responses and listen to user input. You can configure an agent to use either a single voice provider or combine multiple providers for different operations.

Basic usage
Direct link to Basic usage

The simplest way to add voice to an agent is to use a single provider for both speaking and listening:

import { createReadStream } from "fs";
import path from "path";
import { Agent } from "@mastra/core/agent";
import { OpenAIVoice } from "@mastra/voice-openai";

// Initialize the voice provider with default settings
const voice = new OpenAIVoice();

// Create an agent with voice capabilities
export const agent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions: `You are a helpful assistant with both STT and TTS capabilities.`,
model: "openai/gpt-5.1",
voice,
});

// The agent can now use voice for interaction
const audioStream = await agent.voice.speak("Hello, I'm your AI assistant!", {
filetype: "m4a",
});

playAudio(audioStream!);

try {
const transcription = await agent.voice.listen(audioStream);
console.log(transcription);
} catch (error) {
console.error("Error transcribing audio:", error);
}

Working with Audio Streams
Direct link to Working with Audio Streams

The speak() and listen() methods work with Node.js streams. Here's how to save and load audio files:

Saving Speech Output
Direct link to Saving Speech Output

The speak method returns a stream that you can pipe to a file or speaker.

import { createWriteStream } from "fs";
import path from "path";

// Generate speech and save to file
const audio = await agent.voice.speak("Hello, World!");
const filePath = path.join(process.cwd(), "agent.mp3");
const writer = createWriteStream(filePath);

audio.pipe(writer);

await new Promise<void>((resolve, reject) => {
writer.on("finish", () => resolve());
writer.on("error", reject);
});

Transcribing Audio Input
Direct link to Transcribing Audio Input

The listen method expects a stream of audio data from a microphone or file.

import { createReadStream } from "fs";
import path from "path";

// Read audio file and transcribe
const audioFilePath = path.join(process.cwd(), "/agent.m4a");
const audioStream = createReadStream(audioFilePath);

try {
console.log("Transcribing audio file...");
const transcription = await agent.voice.listen(audioStream, {
filetype: "m4a",
});
console.log("Transcription:", transcription);
} catch (error) {
console.error("Error transcribing audio:", error);
}

Speech-to-Speech Voice Interactions
Direct link to Speech-to-Speech Voice Interactions

For more dynamic and interactive voice experiences, you can use real-time voice providers that support speech-to-speech capabilities:

import { Agent } from "@mastra/core/agent";
import { getMicrophoneStream } from "@mastra/node-audio";
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import { search, calculate } from "../tools";

// Initialize the realtime voice provider
const voice = new OpenAIRealtimeVoice({
apiKey: process.env.OPENAI_API_KEY,
model: "gpt-5.1-realtime",
speaker: "alloy",
});

// Create an agent with speech-to-speech voice capabilities
export const agent = new Agent({
id: "speech-to-speech-agent",
name: "Speech-to-Speech Agent",
instructions: `You are a helpful assistant with speech-to-speech capabilities.`,
model: "openai/gpt-5.1",
tools: {
// Tools configured on Agent are passed to voice provider
search,
calculate,
},
voice,
});

// Establish a WebSocket connection
await agent.voice.connect();

// Start a conversation
agent.voice.speak("Hello, I'm your AI assistant!");

// Stream audio from a microphone
const microphoneStream = getMicrophoneStream();
agent.voice.send(microphoneStream);

// When done with the conversation
agent.voice.close();

Event System
Direct link to Event System

The realtime voice provider emits several events you can listen for:

// Listen for speech audio data sent from voice provider
agent.voice.on("speaking", ({ audio }) => {
// audio contains ReadableStream or Int16Array audio data
});

// Listen for transcribed text sent from both voice provider and user
agent.voice.on("writing", ({ text, role }) => {
console.log(`${role} said: ${text}`);
});

// Listen for errors
agent.voice.on("error", (error) => {
console.error("Voice error:", error);
});

Examples
Direct link to Examples

End-to-end voice interaction
Direct link to End-to-end voice interaction

This example demonstrates a voice interaction between two agents. The hybrid voice agent, which uses multiple providers, speaks a question, which is saved as an audio file. The unified voice agent listens to that file, processes the question, generates a response, and speaks it back. Both audio outputs are saved to the audio directory.

The following files are created:

  • hybrid-question.mp3 – Hybrid agent's spoken question.
  • unified-response.mp3 – Unified agent's spoken response.
src/test-voice-agents.ts
import "dotenv/config";

import path from "path";
import { createReadStream } from "fs";
import { Agent } from "@mastra/core/agent";
import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { Mastra } from "@mastra/core";

// Saves an audio stream to a file in the audio directory, creating the directory if it doesn't exist.
export const saveAudioToFile = async (
audio: NodeJS.ReadableStream,
filename: string,
): Promise<void> => {
const audioDir = path.join(process.cwd(), "audio");
const filePath = path.join(audioDir, filename);

await fs.promises.mkdir(audioDir, { recursive: true });

const writer = createWriteStream(filePath);
audio.pipe(writer);
return new Promise((resolve, reject) => {
writer.on("finish", resolve);
writer.on("error", reject);
});
};

// Saves an audio stream to a file in the audio directory, creating the directory if it doesn't exist.
export const convertToText = async (
input: string | NodeJS.ReadableStream,
): Promise<string> => {
if (typeof input === "string") {
return input;
}

const chunks: Buffer[] = [];
return new Promise((resolve, reject) => {
inputData.on("data", (chunk) => chunks.push(Buffer.from(chunk)));
inputData.on("error", reject);
inputData.on("end", () => resolve(Buffer.concat(chunks).toString("utf-8")));
});
};

export const hybridVoiceAgent = new Agent({
id: "hybrid-voice-agent",
name: "Hybrid Voice Agent",
model: "openai/gpt-5.1",
instructions: "You can speak and listen using different providers.",
voice: new CompositeVoice({
input: new OpenAIVoice(),
output: new OpenAIVoice(),
}),
});

export const unifiedVoiceAgent = new Agent({
id: "unified-voice-agent",
name: "Unified Voice Agent",
instructions: "You are an agent with both STT and TTS capabilities.",
model: "openai/gpt-5.1",
voice: new OpenAIVoice(),
});

export const mastra = new Mastra({
agents: { hybridVoiceAgent, unifiedVoiceAgent },
});

const hybridVoiceAgent = mastra.getAgent("hybridVoiceAgent");
const unifiedVoiceAgent = mastra.getAgent("unifiedVoiceAgent");

const question = "What is the meaning of life in one sentence?";

const hybridSpoken = await hybridVoiceAgent.voice.speak(question);

await saveAudioToFile(hybridSpoken!, "hybrid-question.mp3");

const audioStream = createReadStream(
path.join(process.cwd(), "audio", "hybrid-question.mp3"),
);
const unifiedHeard = await unifiedVoiceAgent.voice.listen(audioStream);

const inputText = await convertToText(unifiedHeard!);

const unifiedResponse = await unifiedVoiceAgent.generate(inputText);
const unifiedSpoken = await unifiedVoiceAgent.voice.speak(unifiedResponse.text);

await saveAudioToFile(unifiedSpoken!, "unified-response.mp3");

Using Multiple Providers
Direct link to Using Multiple Providers

For more flexibility, you can use different providers for speaking and listening using the CompositeVoice class:

import { Agent } from "@mastra/core/agent";
import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";

export const agent = new Agent({
id: "voice-agent",
name: "Voice Agent",
instructions: `You are a helpful assistant with both STT and TTS capabilities.`,
model: "openai/gpt-5.1",

// Create a composite voice using OpenAI for listening and PlayAI for speaking
voice: new CompositeVoice({
input: new OpenAIVoice(),
output: new PlayAIVoice(),
}),
});

Using AI SDK
Direct link to Using AI SDK

Mastra supports using AI SDK's transcription and speech models directly in CompositeVoice, giving you access to a wide range of providers through the AI SDK ecosystem:

import { Agent } from "@mastra/core/agent";
import { CompositeVoice } from "@mastra/core/voice";
import { openai } from "@ai-sdk/openai";
import { elevenlabs } from "@ai-sdk/elevenlabs";
import { groq } from "@ai-sdk/groq";

export const agent = new Agent({
id: "aisdk-voice-agent",
name: "AI SDK Voice Agent",
instructions: `You are a helpful assistant with voice capabilities.`,
model: "openai/gpt-5.1",

// Pass AI SDK models directly to CompositeVoice
voice: new CompositeVoice({
input: openai.transcription('whisper-1'), // AI SDK transcription model
output: elevenlabs.speech('eleven_turbo_v2'), // AI SDK speech model
}),
});

// Use voice capabilities as usual
const audioStream = await agent.voice.speak("Hello!");
const transcribedText = await agent.voice.listen(audioStream);

Mix and Match Providers
Direct link to Mix and Match Providers

You can mix AI SDK models with Mastra voice providers:

import { CompositeVoice } from "@mastra/core/voice";
import { PlayAIVoice } from "@mastra/voice-playai";
import { openai } from "@ai-sdk/openai";

// Use AI SDK for transcription and Mastra provider for speech
const voice = new CompositeVoice({
input: openai.transcription('whisper-1'), // AI SDK
output: new PlayAIVoice(), // Mastra provider
});

For the complete list of supported AI SDK providers and their capabilities:

Supported Voice Providers
Direct link to Supported Voice Providers

Mastra supports multiple voice providers for text-to-speech (TTS) and speech-to-text (STT) capabilities:

ProviderPackageFeaturesReference
OpenAI@mastra/voice-openaiTTS, STTDocumentation
OpenAI Realtime@mastra/voice-openai-realtimeRealtime speech-to-speechDocumentation
ElevenLabs@mastra/voice-elevenlabsHigh-quality TTSDocumentation
PlayAI@mastra/voice-playaiTTSDocumentation
Google@mastra/voice-googleTTS, STTDocumentation
Deepgram@mastra/voice-deepgramSTTDocumentation
Murf@mastra/voice-murfTTSDocumentation
Speechify@mastra/voice-speechifyTTSDocumentation
Sarvam@mastra/voice-sarvamTTS, STTDocumentation
Azure@mastra/voice-azureTTS, STTDocumentation
Cloudflare@mastra/voice-cloudflareTTSDocumentation

Next Steps
Direct link to Next Steps