DocsVoiceOverview

Voice in Mastra

Mastra’s Voice system provides a unified interface for voice interactions, enabling text-to-speech (TTS), speech-to-text (STT), and real-time voice-to-voice capabilities in your applications.

Key Features

  • Standardized API across different voice providers
  • Support for multiple voice services
  • Voice-to-voice interactions using events for continuous audio streaming
  • Composable voice providers for mixing TTS and STT services

Adding Voice to Agents

To learn how to integrate voice capabilities into your agents, check out the Adding Voice to Agents documentation. This section covers how to use both single and multiple voice providers, as well as real-time interactions.

Example of Using a Single Voice Provider

import { OpenAIVoice } from "@mastra/voice-openai";
 
// Initialize OpenAI voice for TTS
const voice = new OpenAIVoice({
  speechModel: {
    name: "tts-1-hd", // Specify the TTS model
    apiKey: process.env.OPENAI_API_KEY, // Your OpenAI API key
  },
});
 
// Convert text to speech
const audioStream = await voice.speak("Hello! How can I assist you today?", {
  speaker: "default", // Optional: specify a speaker
});
 
// Play the audio response
playAudio(audioStream);

Example of Using Multiple Voice Providers

This example demonstrates how to create and use two different voice providers in Mastra: OpenAI for speech-to-text (STT) and PlayAI for text-to-speech (TTS).

Start by creating instances of the voice providers with any necessary configuration.

import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";
import { CompositeVoice } from "@mastra/core/voice";
 
// Initialize OpenAI voice for STT
const listeningProvider = new OpenAIVoice({
  listeningModel: {
    name: "whisper-1",
    apiKey: process.env.OPENAI_API_KEY,
  },
});
 
// Initialize PlayAI voice for TTS
const speakingProvider = new PlayAIVoice({
  speechModel: {
    name: "playai-voice",
    apiKey: process.env.PLAYAI_API_KEY,
  },
});
 
// Combine the providers using CompositeVoice
const voice = new CompositeVoice({
  listeningProvider,
  speakingProvider,
});
 
// Implement voice interactions using the combined voice provider
const audioStream = getMicrophoneStream(); // Assume this function gets audio input
const transcript = await voice.listen(audioStream);
 
// Log the transcribed text
console.log("Transcribed text:", transcript);
 
// Convert text to speech
const responseAudio = await voice.speak(`You said: ${transcript}`, {
  speaker: "default", // Optional: specify a speaker
});
 
// Play the audio response
playAudio(responseAudio);

Real-time Capabilities

Many voice providers support real-time speech-to-speech interactions through WebSocket connections, enabling:

  • Live voice conversations with AI
  • Streaming transcription
  • Real-time text-to-speech synthesis
  • Tool usage during conversations

Voice Configuration

Voice providers can be configured with different models and options:

const voice = new OpenAIVoice({
  speechModel: {
    name: "tts-1-hd",
    apiKey: process.env.OPENAI_API_KEY
  },
  listeningModel: {
    name: "whisper-1"
  },
  speaker: "alloy"
});

Available Voice Providers

Mastra supports a variety of voice providers, including:

  • OpenAI
  • PlayAI
  • Murf
  • ElevenLabs
  • More

More Resources