Skip to main content

OpenAI

The OpenAIVoice class in Mastra provides text-to-speech and speech-to-text capabilities using OpenAI's models.

Usage example
Direct link to Usage example

import { OpenAIVoice } from '@mastra/voice-openai'

// Initialize with default configuration using environment variables
const voice = new OpenAIVoice()

// Or initialize with specific configuration
const voiceWithConfig = new OpenAIVoice({
speechModel: {
name: 'tts-1-hd',
apiKey: 'your-openai-api-key',
},
listeningModel: {
name: 'whisper-1',
apiKey: 'your-openai-api-key',
},
speaker: 'alloy', // Default voice
})

// Convert text to speech
const audioStream = await voice.speak('Hello, how can I help you?', {
speaker: 'nova', // Override default voice
speed: 1.2, // Adjust speech speed
})

// Convert speech to text
const text = await voice.listen(audioStream, {
filetype: 'mp3',
})

Configuration
Direct link to Configuration

Constructor options
Direct link to Constructor options

speechModel?:

OpenAIConfig
= { name: 'tts-1' }
Configuration for text-to-speech synthesis.
OpenAIConfig

name?:

'tts-1' | 'tts-1-hd' | 'whisper-1'
Model name. Use 'tts-1-hd' for higher quality audio.

apiKey?:

string
OpenAI API key. Falls back to OPENAI_API_KEY environment variable.

listeningModel?:

OpenAIConfig
= { name: 'whisper-1' }
Configuration for speech-to-text recognition.
OpenAIConfig

name?:

'tts-1' | 'tts-1-hd' | 'whisper-1'
Model name. Use 'tts-1-hd' for higher quality audio.

apiKey?:

string
OpenAI API key. Falls back to OPENAI_API_KEY environment variable.

speaker?:

OpenAIVoiceId
= 'alloy'
Default voice ID for speech synthesis.

Methods
Direct link to Methods

speak()
Direct link to speak

Converts text to speech using OpenAI's text-to-speech models.

input:

string | NodeJS.ReadableStream
Text or text stream to convert to speech.

options?:

Options
Configuration options.
Options

speaker?:

OpenAIVoiceId
Voice ID to use for speech synthesis.

speed?:

number
Speech speed multiplier.

Returns: Promise<NodeJS.ReadableStream>

listen()
Direct link to listen

Transcribes audio using OpenAI's Whisper model.

audioStream:

NodeJS.ReadableStream
Audio stream to transcribe.

options?:

Options
Configuration options.
Options

filetype?:

string
Audio format of the input stream.

Returns: Promise<string>

getSpeakers()
Direct link to getspeakers

Returns an array of available voice options, where each node contains:

voiceId:

string
Unique identifier for the voice

Notes
Direct link to Notes

  • API keys can be provided via constructor options or the OPENAI_API_KEY environment variable
  • The tts-1-hd model provides higher quality audio but may have slower processing times
  • Speech recognition supports multiple audio formats including mp3, wav, and webm