Giving your Agent a Voice

Mastra agents can be enhanced with voice capabilities, enabling them to speak and listen. This example demonstrates two ways to configure voice functionality:

Using a composite voice setup that separates input and output streams,
Using a unified voice provider that handles both.

Both examples use the OpenAIVoice provider for demonstration purposes.

Prerequisites

This example uses the openai model. Make sure to add OPENAI_API_KEY to your .env file.

OPENAI_API_KEY=<your-api-key>

Installation

npm install @mastra/voice-openai

Hybrid voice agent

This agent uses a composite voice setup that separates speech-to-text and text-to-speech functionality. The CompositeVoice allows you to configure different providers for listening (input) and speaking (output). However, in this example, both are handled by the same provider: OpenAIVoice.

import { Agent } from "@mastra/core/agent";
import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { openai } from "@ai-sdk/openai";

export const hybridVoiceAgent = new Agent({
  name: "hybrid-voice-agent",
  model: openai("gpt-4o"),
  instructions: "You can speak and listen using different providers.",
  voice: new CompositeVoice({
    input: new OpenAIVoice(),
    output: new OpenAIVoice(),
  }),
});

See Agent for a full list of configuration options.

Unified voice agent

This agent uses a single voice provider for both speech-to-text and text-to-speech. If you plan to use the same provider for both listening and speaking, this is a simpler setup. In this example, the OpenAIVoice provider handles both functions.

import { openai } from "@ai-sdk/openai";
import { Agent } from "@mastra/core/agent";
import { OpenAIVoice } from "@mastra/voice-openai";

export const unifiedVoiceAgent = new Agent({
  name: "unified-voice-agent",
  instructions: "You are an agent with both STT and TTS capabilities.",
  model: openai("gpt-4o"),
  voice: new OpenAIVoice(),
});

See Agent for a full list of configuration options.

Registering agents

To use these agents, register them in your main Mastra instance.

import { Mastra } from "@mastra/core/mastra";

import { hybridVoiceAgent } from "./agents/example-hybrid-voice-agent";
import { unifiedVoiceAgent } from "./agents/example-unified-voice-agent";

export const mastra = new Mastra({
  // ...
  agents: { hybridVoiceAgent, unifiedVoiceAgent },
});

Functions

These helper functions handle audio file operations and text conversion for the voice interaction example.

`saveAudioToFile`

This function saves an audio stream to a file in the audio directory, creating the directory if it doesn't exist.

import fs, { createWriteStream } from "fs";
import path from "path";

export const saveAudioToFile = async (
  audio: NodeJS.ReadableStream,
  filename: string,
): Promise<void> => {
  const audioDir = path.join(process.cwd(), "audio");
  const filePath = path.join(audioDir, filename);

  await fs.promises.mkdir(audioDir, { recursive: true });

  const writer = createWriteStream(filePath);
  audio.pipe(writer);
  return new Promise((resolve, reject) => {
    writer.on("finish", resolve);
    writer.on("error", reject);
  });
};

`convertToText`

This function converts either a string or a readable stream to text, handling both input types for voice processing.

export const convertToText = async (
  input: string | NodeJS.ReadableStream,
): Promise<string> => {
  if (typeof input === "string") {
    return input;
  }

  const chunks: Buffer[] = [];
  return new Promise((resolve, reject) => {
    input.on("data", (chunk) => chunks.push(Buffer.from(chunk)));
    input.on("error", reject);
    input.on("end", () => resolve(Buffer.concat(chunks).toString("utf-8")));
  });
};

Example usage

This example demonstrates a voice interaction between two agents. The hybrid voice agent speaks a question, which is saved as an audio file. The unified voice agent listens to that file, processes the question, generates a response, and speaks it back. Both audio outputs are saved to the audio directory.

The following files are created:

hybrid-question.mp3 – Hybrid agent's spoken question.
unified-response.mp3 – Unified agent's spoken response.

import "dotenv/config";

import path from "path";
import { createReadStream } from "fs";
import { mastra } from "./mastra";

import { saveAudioToFile } from "./mastra/utils/save-audio-to-file";
import { convertToText } from "./mastra/utils/convert-to-text";

const hybridVoiceAgent = mastra.getAgent("hybridVoiceAgent");
const unifiedVoiceAgent = mastra.getAgent("unifiedVoiceAgent");

const question = "What is the meaning of life in one sentence?";

const hybridSpoken = await hybridVoiceAgent.voice.speak(question);

await saveAudioToFile(hybridSpoken!, "hybrid-question.mp3");

const audioStream = createReadStream(
  path.join(process.cwd(), "audio", "hybrid-question.mp3"),
);
const unifiedHeard = await unifiedVoiceAgent.voice.listen(audioStream);

const inputText = await convertToText(unifiedHeard!);

const unifiedResponse = await unifiedVoiceAgent.generate(inputText);
const unifiedSpoken = await unifiedVoiceAgent.voice.speak(unifiedResponse.text);

await saveAudioToFile(unifiedSpoken!, "unified-response.mp3");

View source on GitHub

Calling Agents

Prerequisites​

Installation​

Hybrid voice agent​

Unified voice agent​

Registering agents​

Functions​

saveAudioToFile​

convertToText​

Example usage​

Related​