Skip to main content
Mastra v1 is coming in January 2026. Get ahead by starting new projects with the beta or upgrade your existing project today.

OpenAI

The OpenAIVoice class in Mastra provides text-to-speech and speech-to-text capabilities using OpenAI's models.

Usage ExampleDirect link to Usage Example

import { OpenAIVoice } from "@mastra/voice-openai";

// Initialize with default configuration using environment variables
const voice = new OpenAIVoice();

// Or initialize with specific configuration
const voiceWithConfig = new OpenAIVoice({
speechModel: {
name: "tts-1-hd",
apiKey: "your-openai-api-key",
},
listeningModel: {
name: "whisper-1",
apiKey: "your-openai-api-key",
},
speaker: "alloy", // Default voice
});

// Convert text to speech
const audioStream = await voice.speak("Hello, how can I help you?", {
speaker: "nova", // Override default voice
speed: 1.2, // Adjust speech speed
});

// Convert speech to text
const text = await voice.listen(audioStream, {
filetype: "mp3",
});

ConfigurationDirect link to Configuration

Constructor OptionsDirect link to Constructor Options

speechModel?:

OpenAIConfig
= { name: 'tts-1' }
Configuration for text-to-speech synthesis.

listeningModel?:

OpenAIConfig
= { name: 'whisper-1' }
Configuration for speech-to-text recognition.

speaker?:

OpenAIVoiceId
= 'alloy'
Default voice ID for speech synthesis.

OpenAIConfigDirect link to OpenAIConfig

name?:

'tts-1' | 'tts-1-hd' | 'whisper-1'
Model name. Use 'tts-1-hd' for higher quality audio.

apiKey?:

string
OpenAI API key. Falls back to OPENAI_API_KEY environment variable.

MethodsDirect link to Methods

speak()Direct link to speak()

Converts text to speech using OpenAI's text-to-speech models.

input:

string | NodeJS.ReadableStream
Text or text stream to convert to speech.

options.speaker?:

OpenAIVoiceId
= Constructor's speaker value
Voice ID to use for speech synthesis.

options.speed?:

number
= 1.0
Speech speed multiplier.

Returns: Promise<NodeJS.ReadableStream>

listen()Direct link to listen()

Transcribes audio using OpenAI's Whisper model.

audioStream:

NodeJS.ReadableStream
Audio stream to transcribe.

options.filetype?:

string
= 'mp3'
Audio format of the input stream.

Returns: Promise<string>

getSpeakers()Direct link to getSpeakers()

Returns an array of available voice options, where each node contains:

voiceId:

string
Unique identifier for the voice

NotesDirect link to Notes

  • API keys can be provided via constructor options or the OPENAI_API_KEY environment variable
  • The tts-1-hd model provides higher quality audio but may have slower processing times
  • Speech recognition supports multiple audio formats including mp3, wav, and webm