OpenAI

The OpenAIVoice class in Mastra provides text-to-speech and speech-to-text capabilities using OpenAI’s models.

Usage Example


import { OpenAIVoice } from "@mastra/voice-openai";
 
// Initialize with default configuration using environment variables
const voice = new OpenAIVoice();
 
// Or initialize with specific configuration
const voiceWithConfig = new OpenAIVoice({
  speechModel: {
    name: "tts-1-hd",
    apiKey: "your-openai-api-key",
  },
  listeningModel: {
    name: "whisper-1",
    apiKey: "your-openai-api-key",
  },
  speaker: "alloy", // Default voice
});
 
// Convert text to speech
const audioStream = await voice.speak("Hello, how can I help you?", {
  speaker: "nova", // Override default voice
  speed: 1.2, // Adjust speech speed
});
 
// Convert speech to text
const text = await voice.listen(audioStream, {
  filetype: "mp3",
});

Configuration

Constructor Options

speechModel?:

OpenAIConfig

= { name: 'tts-1' }

Configuration for text-to-speech synthesis.

listeningModel?:

OpenAIConfig

= { name: 'whisper-1' }

Configuration for speech-to-text recognition.

speaker?:

OpenAIVoiceId

= 'alloy'

Default voice ID for speech synthesis.

OpenAIConfig

name?:

'tts-1' | 'tts-1-hd' | 'whisper-1'

Model name. Use 'tts-1-hd' for higher quality audio.

apiKey?:

string

OpenAI API key. Falls back to OPENAI_API_KEY environment variable.

Methods

speak()

Converts text to speech using OpenAI’s text-to-speech models.

input:

string | NodeJS.ReadableStream

Text or text stream to convert to speech.

options.speaker?:

OpenAIVoiceId

= Constructor's speaker value

Voice ID to use for speech synthesis.

options.speed?:

number

= 1.0

Speech speed multiplier.

Returns: Promise<NodeJS.ReadableStream>

listen()

Transcribes audio using OpenAI’s Whisper model.

audioStream:

NodeJS.ReadableStream

Audio stream to transcribe.

options.filetype?:

string

= 'mp3'

Audio format of the input stream.

Returns: Promise<string>

getSpeakers()

Returns an array of available voice options, where each node contains:

voiceId:

string

Unique identifier for the voice

Notes

API keys can be provided via constructor options or the OPENAI_API_KEY environment variable
The tts-1-hd model provides higher quality audio but may have slower processing times
Speech recognition supports multiple audio formats including mp3, wav, and webm