OpenAI
The OpenAIVoice class in Mastra provides text-to-speech and speech-to-text capabilities using OpenAI’s models.
Usage Example
import { OpenAIVoice } from '@mastra/voice-openai';
// Initialize with default configuration using environment variables
const voice = new OpenAIVoice();
// Or initialize with specific configuration
const voiceWithConfig = new OpenAIVoice({
speechModel: {
name: 'tts-1-hd',
apiKey: 'your-openai-api-key'
},
listeningModel: {
name: 'whisper-1',
apiKey: 'your-openai-api-key'
},
speaker: 'alloy' // Default voice
});
// Convert text to speech
const audioStream = await voice.speak('Hello, how can I help you?', {
speaker: 'nova', // Override default voice
speed: 1.2 // Adjust speech speed
});
// Convert speech to text
const text = await voice.listen(audioStream, {
filetype: 'mp3'
});
Configuration
Constructor Options
speechModel?:
OpenAIConfig
= { name: 'tts-1' }
Configuration for text-to-speech synthesis.
listeningModel?:
OpenAIConfig
= { name: 'whisper-1' }
Configuration for speech-to-text recognition.
speaker?:
OpenAIVoiceId
= 'alloy'
Default voice ID for speech synthesis.
OpenAIConfig
name?:
'tts-1' | 'tts-1-hd' | 'whisper-1'
Model name. Use 'tts-1-hd' for higher quality audio.
apiKey?:
string
OpenAI API key. Falls back to OPENAI_API_KEY environment variable.
Methods
speak()
Converts text to speech using OpenAI’s text-to-speech models.
input:
string | NodeJS.ReadableStream
Text or text stream to convert to speech.
options.speaker?:
OpenAIVoiceId
= Constructor's speaker value
Voice ID to use for speech synthesis.
options.speed?:
number
= 1.0
Speech speed multiplier.
Returns: Promise<NodeJS.ReadableStream>
listen()
Transcribes audio using OpenAI’s Whisper model.
audioStream:
NodeJS.ReadableStream
Audio stream to transcribe.
options.filetype?:
string
= 'mp3'
Audio format of the input stream.
Returns: Promise<string>
getSpeakers()
Returns an array of available voice options, where each node contains:
voiceId:
string
Unique identifier for the voice
Notes
- API keys can be provided via constructor options or the
OPENAI_API_KEY
environment variable - The
tts-1-hd
model provides higher quality audio but may have slower processing times - Speech recognition supports multiple audio formats including mp3, wav, and webm