Skip to Content
ExamplesAgentsUsing Voice

Giving your Agent a Voice

Mastra agents can be enhanced with voice capabilities, enabling them to speak and listen. This example demonstrates two ways to configure voice functionality:

  1. Using a composite voice setup that separates input and output streams,
  2. Using a unified voice provider that handles both.

Both examples use the OpenAIVoice provider for demonstration purposes.

Prerequisites

This example uses the openai model. Make sure to add OPENAI_API_KEY to your .env file.

.env
OPENAI_API_KEY=<your-api-key>

Installation

npm install @mastra/voice-openai

Hybrid voice agent

This agent uses a composite voice setup that separates speech-to-text and text-to-speech functionality. The CompositeVoice allows you to configure different providers for listening (input) and speaking (output). However, in this example, both are handled by the same provider: OpenAIVoice.

src/mastra/agents/example-hybrid-voice-agent.ts
import { Agent } from "@mastra/core/agent"; import { CompositeVoice } from "@mastra/core/voice"; import { OpenAIVoice } from "@mastra/voice-openai"; import { openai } from "@ai-sdk/openai"; export const hybridVoiceAgent = new Agent({ name: "hybrid-voice-agent", model: openai("gpt-4o"), instructions: "You can speak and listen using different providers.", voice: new CompositeVoice({ input: new OpenAIVoice(), output: new OpenAIVoice() }) });

See Agent for a full list of configuration options.

Unified voice agent

This agent uses a single voice provider for both speech-to-text and text-to-speech. If you plan to use the same provider for both listening and speaking, this is a simpler setup. In this example, the OpenAIVoice provider handles both functions.

src/mastra/agents/example-unified-voice-agent.ts
import { openai } from "@ai-sdk/openai"; import { Agent } from "@mastra/core/agent"; import { OpenAIVoice } from "@mastra/voice-openai"; export const unifiedVoiceAgent = new Agent({ name: "unified-voice-agent", instructions: "You are an agent with both STT and TTS capabilities.", model: openai("gpt-4o"), voice: new OpenAIVoice() });

See Agent for a full list of configuration options.

Registering agents

To use these agents, register them in your main Mastra instance.

src/mastra/index.ts
import { Mastra } from "@mastra/core/mastra"; import { hybridVoiceAgent } from "./agents/example-hybrid-voice-agent"; import { unifiedVoiceAgent } from "./agents/example-unified-voice-agent"; export const mastra = new Mastra({ // ... agents: { hybridVoiceAgent, unifiedVoiceAgent } });

Functions

These helper functions handle audio file operations and text conversion for the voice interaction example.

saveAudioToFile

This function saves an audio stream to a file in the audio directory, creating the directory if it doesn’t exist.

src/mastra/utils/save-audio-to-file.ts
import fs, { createWriteStream } from "fs"; import path from "path"; export const saveAudioToFile = async (audio: NodeJS.ReadableStream, filename: string): Promise<void> => { const audioDir = path.join(process.cwd(), "audio"); const filePath = path.join(audioDir, filename); await fs.promises.mkdir(audioDir, { recursive: true }); const writer = createWriteStream(filePath); audio.pipe(writer); return new Promise((resolve, reject) => { writer.on("finish", resolve); writer.on("error", reject); }); };

convertToText

This function converts either a string or a readable stream to text, handling both input types for voice processing.

src/mastra/utils/convert-to-text.ts
export const convertToText = async (input: string | NodeJS.ReadableStream): Promise<string> => { if (typeof input === "string") { return input; } const chunks: Buffer[] = []; return new Promise((resolve, reject) => { input.on("data", (chunk) => chunks.push(Buffer.from(chunk))); input.on("error", reject); input.on("end", () => resolve(Buffer.concat(chunks).toString("utf-8"))); }); };

Example usage

This example demonstrates a voice interaction between two agents. The hybrid voice agent speaks a question, which is saved as an audio file. The unified voice agent listens to that file, processes the question, generates a response, and speaks it back. Both audio outputs are saved to the audio directory.

The following files are created:

  • hybrid-question.mp3 – Hybrid agent’s spoken question.
  • unified-response.mp3 – Unified agent’s spoken response.
src/test-voice-agents.ts
import "dotenv/config"; import path from "path"; import { createReadStream } from "fs"; import { mastra } from "./mastra"; import { saveAudioToFile } from "./mastra/utils/save-audio-to-file"; import { convertToText } from "./mastra/utils/convert-to-text"; const hybridVoiceAgent = mastra.getAgent("hybridVoiceAgent"); const unifiedVoiceAgent = mastra.getAgent("unifiedVoiceAgent"); const question = "What is the meaning of life in one sentence?"; const hybridSpoken = await hybridVoiceAgent.voice.speak(question); await saveAudioToFile(hybridSpoken!, "hybrid-question.mp3"); const audioStream = createReadStream(path.join(process.cwd(), "audio", "hybrid-question.mp3")); const unifiedHeard = await unifiedVoiceAgent.voice.listen(audioStream); const inputText = await convertToText(unifiedHeard!); const unifiedResponse = await unifiedVoiceAgent.generate(inputText); const unifiedSpoken = await unifiedVoiceAgent.voice.speak(unifiedResponse.text); await saveAudioToFile(unifiedSpoken!, "unified-response.mp3");
View Example on GitHub (outdated)