Today we're releasing Mastra Voice: features that let you have real-time voice conversations with AI agents.
With Mastra Voice, you can make your agents speak
and listen
through a clean, flexible API. Whether you want to use a single provider for both speech-to-text and text-to-speech, or combine multiple providers for different operations, the implementation remains simple.
Here’s an example of building a helpful voice assistant that asks you how you’re doing:
import { Agent } from "@mastra/core/agent";
import { OpenAIVoice } from "@mastra/voice-openai";
const agent = new Agent({
name: "Agent",
instructions: `You are a helpful assistant with voice capabilities.`,
model: openai("gpt-4o"),
voice: new OpenAIVoice(),
});
const audioStream = await agent.speak("How are you today!");
Mastra Voice works with Node.js streams, making it simple to save speech output to files and even transcribe audio input from various sources.
Why voice matters
Humans don’t just communicate in writing. Therefore, we believe that agents shouldn’t either.
In fact, voice interfaces significantly expand what you can build with AI agents. While text-based interactions work well for many cases, they can also create unnecessary friction. Imagine having to always transcribe your thoughts to text and read a response. What if you need hands-free communication, to meet accessibility requirements, or just prefer the natural back-and-forth of human speech?
Sometimes voice communication is simply more efficient.
Here’s an example of a helpful voice assistant waiting for the user to speak first:
import { Agent } from "@mastra/core/agent";
import { OpenAIVoice } from "@mastra/voice-openai";
const agent = new Agent({
name: 'Agent',
instructions: `You are a helpful assistant with voice capabilities.`,
model: openai('gpt-4o'),
voice: new OpenAIVoice();
});
const audioStream = fs.createReadStream('/path/to.mp3')
const text = await agent.listen(audioStream)
// Hey! How are ya!
Supported Providers
The table shows which voice providers we currently support and what they can do. Each provider offers different combinations of speech-to-text and text-to-speech functionality.

Mix and Match Voice Providers
The CompositeVoice
class is particularly useful when you want to leverage different providers' strengths for different operations. For example, you might prefer OpenAI's speech recognition accuracy but PlayAI's voice quality or cost structure for generating responses.
// OpenAI Voice provider
import { OpenAIVoice } from "@mastra/voice-openai";
const openaiVoice = new OpenAIVoice({
listeningModel: {
name: "whisper-1",
apiKey: "your-openai-api-key",
},
});
// PlayAI Voice provider
import { PlayAIVoice } from "@mastra/voice-playai";
const playAIVoice = new PlayAIVoice({
speechModel: {
name: "PlayDialog",
apiKey: process.env.PLAYAI_API_KEY,
userId: process.env.PLAYAI_USER_ID,
},
speaker: "Angelo", // Default voice
});
// Use CompositeVoice to mix and match providers
import { CompositeVoice } from "@mastra/core/voice";
const customVoice = new CompositeVoice({
listenProvider: openaiVoice, // Use OpenAI for speech recognition
speakProvider: playAIVoice, // Use PlayAI for speech synthesis
});
Each provider implementation in Mastra follows the same interface, making it straightforward to swap between them or add new providers as they become available. This gives you flexibility while maintaining a consistent developer experience across your application.
An example
In this demo, you can see a voice-enabled agent in action. It shows how people can have natural conversations with AI without typing, creating a more fluid experience.
We're actively developing Mastra! If you encounter any issues or have suggestions for improvements, please open an issue on our GitHub repository or contribute directly with a pull request.
Get started with Mastra Voice today by installing the latest version of our packages:
npm install @mastra/core @mastra/voice-openai
The full documentation is available at mastra.ai/docs/reference/voice/mastra-voice with additional examples and configuration options.