ElevenLabs
The ElevenLabs voice implementation in Mastra provides high-quality text-to-speech (TTS) and speech-to-text (STT) capabilities using the ElevenLabs API.
Usage Example
import { ElevenLabsVoice } from "@mastra/voice-elevenlabs";
// Initialize with default configuration (uses ELEVENLABS_API_KEY environment variable)
const voice = new ElevenLabsVoice();
// Initialize with custom configuration
const voice = new ElevenLabsVoice({
speechModel: {
name: 'eleven_multilingual_v2',
apiKey: 'your-api-key',
},
speaker: 'custom-speaker-id',
});
// Text-to-Speech
const audioStream = await voice.speak("Hello, world!");
// Get available speakers
const speakers = await voice.getSpeakers();
Constructor Parameters
speechModel?:
ElevenLabsVoiceConfig
= { name: 'eleven_multilingual_v2' }
Configuration for text-to-speech functionality.
speaker?:
string
= '9BWtsMINqrJLrRacOk9x' (Aria voice)
ID of the speaker to use for text-to-speech
ElevenLabsVoiceConfig
name?:
ElevenLabsModel
= 'eleven_multilingual_v2'
The ElevenLabs model to use
apiKey?:
string
ElevenLabs API key. Falls back to ELEVENLABS_API_KEY environment variable
Methods
speak()
Converts text to speech using the configured speech model and voice.
input:
string | NodeJS.ReadableStream
Text to convert to speech. If a stream is provided, it will be converted to text first.
options?:
object
Additional options for speech synthesis
options.speaker?:
string
Override the default speaker ID for this request
Returns: Promise<NodeJS.ReadableStream>
getSpeakers()
Returns an array of available voice options, where each node contains:
voiceId:
string
Unique identifier for the voice
name:
string
Display name of the voice
language:
string
Language code for the voice
gender:
string
Gender of the voice
listen()
Converts audio input to text using ElevenLabs Speech-to-Text API.
input:
NodeJS.ReadableStream
A readable stream containing the audio data to transcribe
options?:
object
Configuration options for the transcription
The options object supports the following properties:
language_code?:
string
ISO language code (e.g., 'en', 'fr', 'es')
tag_audio_events?:
boolean
Whether to tag audio events like [MUSIC], [LAUGHTER], etc.
num_speakers?:
number
Number of speakers to detect in the audio
filetype?:
string
Audio file format (e.g., 'mp3', 'wav', 'ogg')
timeoutInSeconds?:
number
Request timeout in seconds
maxRetries?:
number
Maximum number of retry attempts
abortSignal?:
AbortSignal
Signal to abort the request
Returns: Promise<string>
- A Promise that resolves to the transcribed text
Important Notes
- An ElevenLabs API key is required. Set it via the
ELEVENLABS_API_KEY
environment variable or pass it in the constructor. - The default speaker is set to Aria (ID: ‘9BWtsMINqrJLrRacOk9x’).
- Speech-to-text functionality is not supported by ElevenLabs.
- Available speakers can be retrieved using the
getSpeakers()
method, which returns detailed information about each voice including language and gender.