Blog

Introducing TTS in Mastra

Jan 20, 2025

Building Text-to-Speech Applications with Mastra

We recently shipped a TTS module that integrates with OpenAI and ElevenLabs speech models.

Let's explore how to use it.

Basic Setup

First, install the required package:

npm install @mastra/tts

Configure your environment:

OPENAI_API_KEY=your_api_key_here

Basic TTS Usage

Initialize the TTS client:

import { OpenAITTS } from "@mastra/tts";

const tts = new OpenAITTS({
  model: {
    name: "tts-1",
  },
});

Voice Selection

OpenAI provides several voices to choose from:

const voices = await tts.voices();
// Available voices: alloy, echo, fable, onyx, nova, shimmer

Generate Speech

Generate audio with your chosen voice:

const { audioResult } = await tts.generate({
  text: "Hello, world!",
  voice: "nova",
  speed: 1.0,
});

Streaming Audio

For real-time audio streaming:

const { audioResult } = await tts.stream({
  text: "This is a streaming response",
  voice: "alloy",
  speed: 1.2,
});

// audioResult is a PassThrough stream

Error Handling and Telemetry

The TTS system includes built-in telemetry and error tracing, so you can use your favorite tracing tools to get visibility into your TTS usage.

Usage with Mastra

Integrate TTS with your Mastra application:

import { Mastra } from "@mastra/core";
import { OpenAITTS } from "@mastra/tts";

const tts = new OpenAITTS({
  model: {
    name: "tts-1",
  },
});

const mastra = new Mastra({
  tts,
});

// Generate speech
const audio = await mastra.tts.generate({
  text: "Welcome to Mastra",
  voice: "nova",
});

The Mastra TTS system provides type-safe speech generation with telemetry and error handling. Start with basic generation and add streaming as needed.

Next Steps: Exposing TTS to Agents

One thing we're thinking about is how to expose TTS to agents.

Currently, our thought is to optionally let agents be configured with a TTS model, and then agent.tts.generate() and agent.tts.stream() would be available, as well as /agents/$AGENT_ID/tts/generate and /agents/$AGENT_ID/tts/stream endpoints.

Some other questions:

  • How should we expose this functionality in the mastra dev UI?

We figured we would embed a sound clip in the chat UI for agents that have a TTS model configured.

  • How should we expose this functionality in agent memory?

We figured we would probably add a new tts field to items in agent memory, and then we could store the TTS model name there.

Share

Stay up to date