DocsVoiceText to Speech

Text-to-Speech (TTS)

Text-to-Speech (TTS) in Mastra offers a unified API for synthesizing spoken audio from text using various provider services. This section explains TTS configuration options and implementation methods. For integrating TTS capabilities with agents, refer to the Adding Voice to Agents documentation.

Speech Configuration

To use TTS in Mastra, you need to provide a speechModel configuration when initializing the voice provider. This configuration includes parameters such as:

  • name: The specific TTS model to use.
  • apiKey: Your API key for authentication.
  • Provider-specific options: Additional options that may be required or supported by the specific voice provider.

The speaker option is specified separately and allows you to select different voices for speech synthesis.

Note: All of these parameters are optional. You can use the default settings provided by the voice provider, which will depend on the specific provider you are using.

Example Configuration

const voice = new OpenAIVoice({
  speechModel: {
    name: "tts-1-hd",
    apiKey: process.env.OPENAI_API_KEY
  },
  speaker: "alloy",
});
 
// If using default settings the configuration can be simplified to:
const voice = new OpenAIVoice();

Using the Speak Method

The primary method for TTS is the speak() method, which converts text to speech. This method can accept options that allows you to specify the speaker and other provider-specific options. Here’s how to use it:

const readableStream = await voice.speak("Hello, world!", {
  speaker: "default", // Optional: specify a speaker
  properties: {
    speed: 1.0, // Optional: adjust speech speed
    pitch: "default", // Optional: specify pitch if supported
  },
});

Note: If you are using a voice-to-voice provider, such as OpenAIRealtimeVoice, the speak() method will emit a “speaking” event instead of returning an Readable Stream.