Adding Voice to Agents
Mastra agents can be enhanced with voice capabilities, allowing them to speak responses and listen to user input. You can configure an agent to use either a single voice provider or combine multiple providers for different operations.
Using a Single Provider
The simplest way to add voice to an agent is to use a single provider for both speaking and listening:
import { Agent } from "@mastra/core/agent";
import { OpenAIVoice } from "@mastra/voice-openai";
// Initialize the voice provider with default settings
const voice = new OpenAIVoice();
// Create an agent with voice capabilities
export const agent = new Agent({
name: 'Agent',
instructions: `You are a helpful assistant with both STT and TTS capabilities.`,
model: openai('gpt-4o'),
voice
});
// The agent can now use voice for interaction
await agent.speak("Hello, I'm your AI assistant!");
const userInput = await agent.listen();
Using Multiple Providers
For more flexibility, you can use different providers for speaking and listening using the CompositeVoice class:
import { Agent } from "@mastra/core/agent";
import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";
export const agent = new Agent({
name: 'Agent',
instructions: `You are a helpful assistant with both STT and TTS capabilities.`,
model: openai('gpt-4o'),
// Create a composite voice using OpenAI for listening and PlayAI for speaking
voice: new CompositeVoice({
listenProvider: new OpenAIVoice(),
speakProvider: new PlayAIVoice(),
}),
});
Working with Audio Streams
The speak()
and listen()
methods work with Node.js streams. Here’s how to save and load audio files:
Saving Speech Output
import { createWriteStream } from 'fs';
import path from 'path';
// Generate speech and save to file
const audio = await agent.speak('Hello, World!');
const filePath = path.join(process.cwd(), 'agent.mp3');
const writer = createWriteStream(filePath);
audio.pipe(writer);
await new Promise<void>((resolve, reject) => {
writer.on('finish', () => resolve());
writer.on('error', reject);
});
Transcribing Audio Input
import { createReadStream } from 'fs';
import path from 'path';
// Read audio file and transcribe
const audioFilePath = path.join(process.cwd(), '/agent.mp3');
const audioStream = createReadStream(audioFilePath);
try {
console.log('Transcribing audio file...');
const transcription = await agent.listen(audioStream);
console.log('Transcription:', transcription);
} catch (error) {
console.error('Error transcribing audio:', error);
}