Voice in Mastra

Mastra’s Voice system provides a unified interface for voice interactions, enabling text-to-speech (TTS), speech-to-text (STT), and real-time speech-to-speech (STS) capabilities in your applications.

Adding Voice to Agents

To learn how to integrate voice capabilities into your agents, check out the Adding Voice to Agents documentation. This section covers how to use both single and multiple voice providers, as well as real-time interactions.


import { Agent } from "@mastra/core/agent";
import { openai } from "@ai-sdk/openai";
import { OpenAIVoice } from "@mastra/voice-openai";
 
// Initialize OpenAI voice for TTS
 
const voiceAgent = new Agent({
  name: "Voice Agent",
  instructions:
    "You are a voice assistant that can help users with their tasks.",
  model: openai("gpt-4o"),
  voice: new OpenAIVoice(),
});

You can then use the following voice capabilities:

Text to Speech (TTS)

Turn your agent’s responses into natural-sounding speech using Mastra’s TTS capabilities. Choose from multiple providers like OpenAI, ElevenLabs, and more.

For detailed configuration options and advanced features, check out our Text-to-Speech guide.

OpenAI


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { OpenAIVoice } from "@mastra/voice-openai";
import { playAudio } from "@mastra/node-audio";
 
const voiceAgent = new Agent({
name: "Voice Agent",
instructions: "You are a voice assistant that can help users with their tasks.",
model: openai("gpt-4o"),
voice: new OpenAIVoice(),
});
 
const { text } = await voiceAgent.generate('What color is the sky?');
 
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: "default", // Optional: specify a speaker
responseFormat: "wav", // Optional: specify a response format
});
 
playAudio(audioStream);

Visit the OpenAI Voice Reference for more information on the OpenAI voice provider.

Azure


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { AzureVoice } from "@mastra/voice-azure";
import { playAudio } from "@mastra/node-audio";
 
const voiceAgent = new Agent({
  name: "Voice Agent",
  instructions: "You are a voice assistant that can help users with their tasks.",
  model: openai("gpt-4o"),
  voice: new AzureVoice(),
});
 
const { text } = await voiceAgent.generate('What color is the sky?');
 
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
  speaker: "en-US-JennyNeural", // Optional: specify a speaker
});
 
playAudio(audioStream);

Visit the Azure Voice Reference for more information on the Azure voice provider.

ElevenLabs


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { ElevenLabsVoice } from "@mastra/voice-elevenlabs";
import { playAudio } from "@mastra/node-audio";
 
const voiceAgent = new Agent({
name: "Voice Agent",
instructions: "You are a voice assistant that can help users with their tasks.",
model: openai("gpt-4o"),
voice: new ElevenLabsVoice(),
});
 
const { text } = await voiceAgent.generate('What color is the sky?');
 
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: "default", // Optional: specify a speaker
});
 
playAudio(audioStream);

Visit the ElevenLabs Voice Reference for more information on the ElevenLabs voice provider.

PlayAI


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { PlayAIVoice } from "@mastra/voice-playai";
import { playAudio } from "@mastra/node-audio";
 
const voiceAgent = new Agent({
  name: "Voice Agent",
  instructions: "You are a voice assistant that can help users with their tasks.",
  model: openai("gpt-4o"),
  voice: new PlayAIVoice(),
});
 
const { text } = await voiceAgent.generate('What color is the sky?');
 
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
  speaker: "default", // Optional: specify a speaker
});
 
playAudio(audioStream);

Visit the PlayAI Voice Reference for more information on the PlayAI voice provider.

Google


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { GoogleVoice } from "@mastra/voice-google";
import { playAudio } from "@mastra/node-audio";
 
const voiceAgent = new Agent({
name: "Voice Agent",
instructions: "You are a voice assistant that can help users with their tasks.",
model: openai("gpt-4o"),
voice: new GoogleVoice(),
});
 
const { text } = await voiceAgent.generate('What color is the sky?');
 
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: "en-US-Studio-O", // Optional: specify a speaker
});
 
playAudio(audioStream);

Visit the Google Voice Reference for more information on the Google voice provider.

Cloudflare


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { CloudflareVoice } from "@mastra/voice-cloudflare";
import { playAudio } from "@mastra/node-audio";
 
const voiceAgent = new Agent({
  name: "Voice Agent",
  instructions: "You are a voice assistant that can help users with their tasks.",
  model: openai("gpt-4o"),
  voice: new CloudflareVoice(),
});
 
const { text } = await voiceAgent.generate('What color is the sky?');
 
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
  speaker: "default", // Optional: specify a speaker
});
 
playAudio(audioStream);

Visit the Cloudflare Voice Reference for more information on the Cloudflare voice provider.

Deepgram


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { DeepgramVoice } from "@mastra/voice-deepgram";
import { playAudio } from "@mastra/node-audio";
 
const voiceAgent = new Agent({
name: "Voice Agent",
instructions: "You are a voice assistant that can help users with their tasks.",
model: openai("gpt-4o"),
voice: new DeepgramVoice(),
});
 
const { text } = await voiceAgent.generate('What color is the sky?');
 
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: "aura-english-us", // Optional: specify a speaker
});
 
playAudio(audioStream);

Visit the Deepgram Voice Reference for more information on the Deepgram voice provider.

Speechify


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { SpeechifyVoice } from "@mastra/voice-speechify";
import { playAudio } from "@mastra/node-audio";
 
const voiceAgent = new Agent({
  name: "Voice Agent",
  instructions: "You are a voice assistant that can help users with their tasks.",
  model: openai("gpt-4o"),
  voice: new SpeechifyVoice(),
});
 
const { text } = await voiceAgent.generate('What color is the sky?');
 
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
  speaker: "matthew", // Optional: specify a speaker
});
 
playAudio(audioStream);

Visit the Speechify Voice Reference for more information on the Speechify voice provider.

Sarvam


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { SarvamVoice } from "@mastra/voice-sarvam";
import { playAudio } from "@mastra/node-audio";
 
const voiceAgent = new Agent({
name: "Voice Agent",
instructions: "You are a voice assistant that can help users with their tasks.",
model: openai("gpt-4o"),
voice: new SarvamVoice(),
});
 
const { text } = await voiceAgent.generate('What color is the sky?');
 
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
speaker: "default", // Optional: specify a speaker
});
 
playAudio(audioStream);

Visit the Sarvam Voice Reference for more information on the Sarvam voice provider.

Murf


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { MurfVoice } from "@mastra/voice-murf";
import { playAudio } from "@mastra/node-audio";
 
const voiceAgent = new Agent({
  name: "Voice Agent",
  instructions: "You are a voice assistant that can help users with their tasks.",
  model: openai("gpt-4o"),
  voice: new MurfVoice(),
});
 
const { text } = await voiceAgent.generate('What color is the sky?');
 
// Convert text to speech to an Audio Stream
const audioStream = await voiceAgent.voice.speak(text, {
  speaker: "default", // Optional: specify a speaker
});
 
playAudio(audioStream);

Visit the Murf Voice Reference for more information on the Murf voice provider.

Speech to Text (STT)

Transcribe spoken content using various providers like OpenAI, ElevenLabs, and more. For detailed configuration options and more, check out Speech to Text.

You can download a sample audio file from here .

OpenAI


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { OpenAIVoice } from "@mastra/voice-openai";
import { createReadStream } from 'fs';
 
const voiceAgent = new Agent({
name: "Voice Agent",
instructions: "You are a voice assistant that can help users with their tasks.",
model: openai("gpt-4o"),
voice: new OpenAIVoice(),
});
 
// Use an audio file from a URL
const audioStream = await createReadStream("./how_can_i_help_you.mp3");
 
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream);
console.log(`User said: ${transcript}`);
 
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript);

Visit the OpenAI Voice Reference for more information on the OpenAI voice provider.

Azure


import { createReadStream } from 'fs';
import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { AzureVoice } from "@mastra/voice-azure";
import { createReadStream } from 'fs';
 
const voiceAgent = new Agent({
  name: "Voice Agent",
  instructions: "You are a voice assistant that can help users with their tasks.",
  model: openai("gpt-4o"),
  voice: new AzureVoice(),
});
 
// Use an audio file from a URL
const audioStream = await createReadStream("./how_can_i_help_you.mp3");
 
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream);
console.log(`User said: ${transcript}`);
 
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript);

Visit the Azure Voice Reference for more information on the Azure voice provider.

ElevenLabs


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { ElevenLabsVoice } from "@mastra/voice-elevenlabs";
import { createReadStream } from 'fs';
 
const voiceAgent = new Agent({
name: "Voice Agent",
instructions: "You are a voice assistant that can help users with their tasks.",
model: openai("gpt-4o"),
voice: new ElevenLabsVoice(),
});
 
// Use an audio file from a URL
const audioStream = await createReadStream("./how_can_i_help_you.mp3");
 
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream);
console.log(`User said: ${transcript}`);
 
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript);

Visit the ElevenLabs Voice Reference for more information on the ElevenLabs voice provider.

Google


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { GoogleVoice } from "@mastra/voice-google";
import { createReadStream } from 'fs';
 
const voiceAgent = new Agent({
  name: "Voice Agent",
  instructions: "You are a voice assistant that can help users with their tasks.",
  model: openai("gpt-4o"),
  voice: new GoogleVoice(),
});
 
// Use an audio file from a URL
const audioStream = await createReadStream("./how_can_i_help_you.mp3");
 
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream);
console.log(`User said: ${transcript}`);
 
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript);

Visit the Google Voice Reference for more information on the Google voice provider.

Cloudflare


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { CloudflareVoice } from "@mastra/voice-cloudflare";
import { createReadStream } from 'fs';
 
const voiceAgent = new Agent({
name: "Voice Agent",
instructions: "You are a voice assistant that can help users with their tasks.",
model: openai("gpt-4o"),
voice: new CloudflareVoice(),
});
 
// Use an audio file from a URL
const audioStream = await createReadStream("./how_can_i_help_you.mp3");
 
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream);
console.log(`User said: ${transcript}`);
 
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript);

Visit the Cloudflare Voice Reference for more information on the Cloudflare voice provider.

Deepgram


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { DeepgramVoice } from "@mastra/voice-deepgram";
import { createReadStream } from 'fs';
 
const voiceAgent = new Agent({
  name: "Voice Agent",
  instructions: "You are a voice assistant that can help users with their tasks.",
  model: openai("gpt-4o"),
  voice: new DeepgramVoice(),
});
 
// Use an audio file from a URL
const audioStream = await createReadStream("./how_can_i_help_you.mp3");
 
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream);
console.log(`User said: ${transcript}`);
 
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript);

Visit the Deepgram Voice Reference for more information on the Deepgram voice provider.

Sarvam


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { SarvamVoice } from "@mastra/voice-sarvam";
import { createReadStream } from 'fs';
 
const voiceAgent = new Agent({
name: "Voice Agent",
instructions: "You are a voice assistant that can help users with their tasks.",
model: openai("gpt-4o"),
voice: new SarvamVoice(),
});
 
// Use an audio file from a URL
const audioStream = await createReadStream("./how_can_i_help_you.mp3");
 
// Convert audio to text
const transcript = await voiceAgent.voice.listen(audioStream);
console.log(`User said: ${transcript}`);
 
// Generate a response based on the transcript
const { text } = await voiceAgent.generate(transcript);

Visit the Sarvam Voice Reference for more information on the Sarvam voice provider.

Speech to Speech (STS)

Create conversational experiences with speech-to-speech capabilities. The unified API enables real-time voice interactions between users and AI agents. For detailed configuration options and advanced features, check out Speech to Speech.

OpenAI


import { Agent } from '@mastra/core/agent';
import { openai } from '@ai-sdk/openai';
import { playAudio, getMicrophoneStream } from '@mastra/node-audio';
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
 
const voiceAgent = new Agent({
  name: "Voice Agent",
  instructions: "You are a voice assistant that can help users with their tasks.",
  model: openai("gpt-4o"),
  voice: new OpenAIRealtimeVoice(),
});
 
// Listen for agent audio responses
voiceAgent.voice.on('speaker', ({ audio }) => {
  playAudio(audio);
});
 
// Initiate the conversation
await voiceAgent.voice.speak('How can I help you today?');
 
// Send continuous audio from the microphone
const micStream = getMicrophoneStream();
await voiceAgent.voice.send(micStream);

Visit the OpenAI Voice Reference for more information on the OpenAI voice provider.

Voice Configuration

Each voice provider can be configured with different models and options. Below are the detailed configuration options for all supported providers:

OpenAI


// OpenAI Voice Configuration
const voice = new OpenAIVoice({
  speechModel: {
    name: "gpt-3.5-turbo", // Example model name
    apiKey: process.env.OPENAI_API_KEY,
    language: "en-US", // Language code
    voiceType: "neural", // Type of voice model
  },
  listeningModel: {
    name: "whisper-1", // Example model name
    apiKey: process.env.OPENAI_API_KEY,
    language: "en-US", // Language code
    format: "wav", // Audio format
  },
  speaker: "alloy", // Example speaker name
});

Visit the OpenAI Voice Reference for more information on the OpenAI voice provider.

Azure


// Azure Voice Configuration
const voice = new AzureVoice({
  speechModel: {
    name: "en-US-JennyNeural", // Example model name
    apiKey: process.env.AZURE_SPEECH_KEY,
    region: process.env.AZURE_SPEECH_REGION,
    language: "en-US", // Language code
    style: "cheerful", // Voice style
    pitch: "+0Hz", // Pitch adjustment
    rate: "1.0", // Speech rate
  },
  listeningModel: {
    name: "en-US", // Example model name
    apiKey: process.env.AZURE_SPEECH_KEY,
    region: process.env.AZURE_SPEECH_REGION,
    format: "simple", // Output format
  },
});

Visit the Azure Voice Reference for more information on the Azure voice provider.

ElevenLabs


// ElevenLabs Voice Configuration
const voice = new ElevenLabsVoice({
  speechModel: {
    voiceId: "your-voice-id", // Example voice ID
    model: "eleven_multilingual_v2", // Example model name
    apiKey: process.env.ELEVENLABS_API_KEY,
    language: "en", // Language code
    emotion: "neutral", // Emotion setting
  },
  // ElevenLabs may not have a separate listening model
});

Visit the ElevenLabs Voice Reference for more information on the ElevenLabs voice provider.

PlayAI


// PlayAI Voice Configuration
const voice = new PlayAIVoice({
  speechModel: {
    name: "playai-voice", // Example model name
    speaker: "emma", // Example speaker name
    apiKey: process.env.PLAYAI_API_KEY,
    language: "en-US", // Language code
    speed: 1.0, // Speech speed
  },
  // PlayAI may not have a separate listening model
});

Visit the PlayAI Voice Reference for more information on the PlayAI voice provider.

Google


// Google Voice Configuration
const voice = new GoogleVoice({
  speechModel: {
    name: "en-US-Studio-O", // Example model name
    apiKey: process.env.GOOGLE_API_KEY,
    languageCode: "en-US", // Language code
    gender: "FEMALE", // Voice gender
    speakingRate: 1.0, // Speaking rate
  },
  listeningModel: {
    name: "en-US", // Example model name
    sampleRateHertz: 16000, // Sample rate
  },
});

Visit the PlayAI Voice Reference for more information on the PlayAI voice provider.

Cloudflare


// Cloudflare Voice Configuration
const voice = new CloudflareVoice({
  speechModel: {
    name: "cloudflare-voice", // Example model name
    accountId: process.env.CLOUDFLARE_ACCOUNT_ID,
    apiToken: process.env.CLOUDFLARE_API_TOKEN,
    language: "en-US", // Language code
    format: "mp3", // Audio format
  },
  // Cloudflare may not have a separate listening model
});

Visit the Cloudflare Voice Reference for more information on the Cloudflare voice provider.

Deepgram


// Deepgram Voice Configuration
const voice = new DeepgramVoice({
  speechModel: {
    name: "nova-2", // Example model name
    speaker: "aura-english-us", // Example speaker name
    apiKey: process.env.DEEPGRAM_API_KEY,
    language: "en-US", // Language code
    tone: "formal", // Tone setting
  },
  listeningModel: {
    name: "nova-2", // Example model name
    format: "flac", // Audio format
  },
});

Visit the Deepgram Voice Reference for more information on the Deepgram voice provider.

Speechify


// Speechify Voice Configuration
const voice = new SpeechifyVoice({
  speechModel: {
    name: "speechify-voice", // Example model name
    speaker: "matthew", // Example speaker name
    apiKey: process.env.SPEECHIFY_API_KEY,
    language: "en-US", // Language code
    speed: 1.0, // Speech speed
  },
  // Speechify may not have a separate listening model
});

Visit the Speechify Voice Reference for more information on the Speechify voice provider.

Sarvam


// Sarvam Voice Configuration
const voice = new SarvamVoice({
  speechModel: {
    name: "sarvam-voice", // Example model name
    apiKey: process.env.SARVAM_API_KEY,
    language: "en-IN", // Language code
    style: "conversational", // Style setting
  },
  // Sarvam may not have a separate listening model
});

Visit the Sarvam Voice Reference for more information on the Sarvam voice provider.

Murf


// Murf Voice Configuration
const voice = new MurfVoice({
  speechModel: {
    name: "murf-voice", // Example model name
    apiKey: process.env.MURF_API_KEY,
    language: "en-US", // Language code
    emotion: "happy", // Emotion setting
  },
  // Murf may not have a separate listening model
});

Visit the Murf Voice Reference for more information on the Murf voice provider.

OpenAI Realtime


// OpenAI Realtime Voice Configuration
const voice = new OpenAIRealtimeVoice({
  speechModel: {
    name: "gpt-3.5-turbo", // Example model name
    apiKey: process.env.OPENAI_API_KEY,
    language: "en-US", // Language code
  },
  listeningModel: {
    name: "whisper-1", // Example model name
    apiKey: process.env.OPENAI_API_KEY,
    format: "ogg", // Audio format
  },
  speaker: "alloy", // Example speaker name
});

For more information on the OpenAI Realtime voice provider, refer to the OpenAI Realtime Voice Reference.

Using Multiple Voice Providers

This example demonstrates how to create and use two different voice providers in Mastra: OpenAI for speech-to-text (STT) and PlayAI for text-to-speech (TTS).

Start by creating instances of the voice providers with any necessary configuration.


import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";
import { CompositeVoice } from "@mastra/core/voice";
import { playAudio, getMicrophoneStream } from "@mastra/node-audio";
 
// Initialize OpenAI voice for STT
const input = new OpenAIVoice({
  listeningModel: {
    name: "whisper-1",
    apiKey: process.env.OPENAI_API_KEY,
  },
});
 
// Initialize PlayAI voice for TTS
const output = new PlayAIVoice({
  speechModel: {
    name: "playai-voice",
    apiKey: process.env.PLAYAI_API_KEY,
  },
});
 
// Combine the providers using CompositeVoice
const voice = new CompositeVoice({
  input,
  output,
});
 
// Implement voice interactions using the combined voice provider
const audioStream = getMicrophoneStream(); // Assume this function gets audio input
const transcript = await voice.listen(audioStream);
 
// Log the transcribed text
console.log("Transcribed text:", transcript);
 
// Convert text to speech
const responseAudio = await voice.speak(`You said: ${transcript}`, {
  speaker: "default", // Optional: specify a speaker,
  responseFormat: "wav", // Optional: specify a response format
});
 
// Play the audio response
playAudio(responseAudio);

For more information on the CompositeVoice, refer to the CompositeVoice Reference.

Voice in Mastra

Adding Voice to Agents

Text to Speech (TTS)

OpenAI

Azure

ElevenLabs

PlayAI

Google

Cloudflare

Deepgram

Speechify

Sarvam

Murf

Speech to Text (STT)

OpenAI

Azure

ElevenLabs

Google

Cloudflare

Deepgram

Sarvam

Speech to Speech (STS)

OpenAI

Voice Configuration

OpenAI

Azure

ElevenLabs

PlayAI

Google

Cloudflare

Deepgram

Speechify

Sarvam

Murf

OpenAI Realtime

Using Multiple Voice Providers

More Resources