Adding Voice to Agents

Mastra agents can be enhanced with voice capabilities, allowing them to speak responses and listen to user input. You can configure an agent to use either a single voice provider or combine multiple providers for different operations.

Using a Single Provider

The simplest way to add voice to an agent is to use a single provider for both speaking and listening:

import { Agent } from "@mastra/core/agent";
import { OpenAIVoice } from "@mastra/voice-openai";
 
// Initialize the voice provider with default settings
const voice = new OpenAIVoice();
 
// Create an agent with voice capabilities
export const agent = new Agent({
  name: 'Agent',
  instructions: `You are a helpful assistant with both STT and TTS capabilities.`,
  model: openai('gpt-4o'),
  voice
});
 
// The agent can now use voice for interaction
await agent.speak("Hello, I'm your AI assistant!");
const userInput = await agent.listen();

Using Multiple Providers

For more flexibility, you can use different providers for speaking and listening using the CompositeVoice class:

import { Agent } from "@mastra/core/agent";
import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";
 
export const agent = new Agent({
  name: 'Agent',
  instructions: `You are a helpful assistant with both STT and TTS capabilities.`,
  model: openai('gpt-4o'),
 
  // Create a composite voice using OpenAI for listening and PlayAI for speaking
  voice: new CompositeVoice({
    listenProvider: new OpenAIVoice(),
    speakProvider: new PlayAIVoice(),
  }),
});

Working with Audio Streams

The speak() and listen() methods work with Node.js streams. Here’s how to save and load audio files:

Saving Speech Output

import { createWriteStream } from 'fs';
import path from 'path';
 
// Generate speech and save to file
const audio = await agent.speak('Hello, World!');
const filePath = path.join(process.cwd(), 'agent.mp3');
const writer = createWriteStream(filePath);
 
audio.pipe(writer);
 
await new Promise<void>((resolve, reject) => {
  writer.on('finish', () => resolve());
  writer.on('error', reject);
});

Transcribing Audio Input

import { createReadStream } from 'fs';
import path from 'path';
 
// Read audio file and transcribe
const audioFilePath = path.join(process.cwd(), '/agent.mp3');
const audioStream = createReadStream(audioFilePath);
 
try {
  console.log('Transcribing audio file...');
  const transcription = await agent.listen(audioStream);
  console.log('Transcription:', transcription);
} catch (error) {
  console.error('Error transcribing audio:', error);
}