Blog/

announcements

Introducing TTS in Mastra

·

Jan 20, 2025

Building Text-to-Speech Applications with Mastra

We recently shipped a TTS module that integrates with OpenAI and ElevenLabs speech models.

Let's explore how to use it.

Basic Setup

First, install the required package:

npm install @mastra/tts

Configure your environment:

OPENAI_API_KEY=your_api_key_here

Basic TTS Usage

Initialize the TTS client:

 1import { OpenAITTS } from "@mastra/tts";
 2
 3const tts = new OpenAITTS({
 4  model: {
 5    name: "tts-1",
 6  },
 7});

Voice Selection

OpenAI provides several voices to choose from:

 1const voices = await tts.voices();
 2// Available voices: alloy, echo, fable, onyx, nova, shimmer

Generate Speech

Generate audio with your chosen voice:

 1const { audioResult } = await tts.generate({
 2  text: "Hello, world!",
 3  voice: "nova",
 4  speed: 1.0,
 5});

Streaming Audio

For real-time audio streaming:

 1const { audioResult } = await tts.stream({
 2  text: "This is a streaming response",
 3  voice: "alloy",
 4  speed: 1.2,
 5});
 6
 7// audioResult is a PassThrough stream

Error Handling and Telemetry

The TTS system includes built-in telemetry and error tracing, so you can use your favorite tracing tools to get visibility into your TTS usage.

Usage with Mastra

Integrate TTS with your Mastra application:

 1import { Mastra } from "@mastra/core";
 2import { OpenAITTS } from "@mastra/tts";
 3
 4const tts = new OpenAITTS({
 5  model: {
 6    name: "tts-1",
 7  },
 8});
 9
10const mastra = new Mastra({
11  tts,
12});
13
14// Generate speech
15const audio = await mastra.tts.generate({
16  text: "Welcome to Mastra",
17  voice: "nova",
18});

The Mastra TTS system provides type-safe speech generation with telemetry and error handling. Start with basic generation and add streaming as needed.

Next Steps: Exposing TTS to Agents

One thing we're thinking about is how to expose TTS to agents.

Currently, our thought is to optionally let agents be configured with a TTS model, and then agent.tts.generate() and agent.tts.stream() would be available, as well as /agents/$AGENT_ID/tts/generate and /agents/$AGENT_ID/tts/stream endpoints.

Some other questions:

  • How should we expose this functionality in the mastra dev UI?

We figured we would embed a sound clip in the chat UI for agents that have a TTS model configured.

  • How should we expose this functionality in agent memory?

We figured we would probably add a new tts field to items in agent memory, and then we could store the TTS model name there.

Stay up to date