Skip to main content

ElevenLabs

The ElevenLabs voice implementation in Mastra provides high-quality text-to-speech (TTS) and speech-to-text (STT) capabilities using the ElevenLabs API.

Usage example
Direct link to Usage example

import { ElevenLabsVoice } from '@mastra/voice-elevenlabs'

// Initialize with default configuration (uses ELEVENLABS_API_KEY environment variable)
const voice = new ElevenLabsVoice()

// Initialize with custom configuration
const voice = new ElevenLabsVoice({
speechModel: {
name: 'eleven_multilingual_v2',
apiKey: 'your-api-key',
},
speaker: 'custom-speaker-id',
})

// Text-to-Speech
const audioStream = await voice.speak('Hello, world!')

// Get available speakers
const speakers = await voice.getSpeakers()

Constructor parameters
Direct link to Constructor parameters

speechModel?:

ElevenLabsVoiceConfig
= { name: 'eleven_multilingual_v2' }
Configuration for text-to-speech functionality.
ElevenLabsVoiceConfig

name?:

ElevenLabsModel
The ElevenLabs model to use

apiKey?:

string
ElevenLabs API key. Falls back to ELEVENLABS_API_KEY environment variable

speaker?:

string
= '9BWtsMINqrJLrRacOk9x' (Aria voice)
ID of the speaker to use for text-to-speech

Methods
Direct link to Methods

speak()
Direct link to speak

Converts text to speech using the configured speech model and voice.

input:

string | NodeJS.ReadableStream
Text to convert to speech. If a stream is provided, it will be converted to text first.

options?:

object
Additional options for speech synthesis
object

speaker?:

string
Override the default speaker ID for this request

Returns: Promise<NodeJS.ReadableStream>

getSpeakers()
Direct link to getspeakers

Returns an array of available voice options, where each node contains:

voiceId:

string
Unique identifier for the voice

name:

string
Display name of the voice

language:

string
Language code for the voice

gender:

string
Gender of the voice

listen()
Direct link to listen

Converts audio input to text using ElevenLabs Speech-to-Text API.

input:

NodeJS.ReadableStream
A readable stream containing the audio data to transcribe

options?:

object
Configuration options for the transcription

The options object supports the following properties:

language_code?:

string
ISO language code (e.g., 'en', 'fr', 'es')

tag_audio_events?:

boolean
Whether to tag audio events like [MUSIC], [LAUGHTER], etc.

num_speakers?:

number
Number of speakers to detect in the audio

filetype?:

string
Audio file format (e.g., 'mp3', 'wav', 'ogg')

timeoutInSeconds?:

number
Request timeout in seconds

maxRetries?:

number
Maximum number of retry attempts

abortSignal?:

AbortSignal
Signal to abort the request

Returns: Promise<string> - A Promise that resolves to the transcribed text

Important notes
Direct link to Important notes

  1. An ElevenLabs API key is required. Set it via the ELEVENLABS_API_KEY environment variable or pass it in the constructor.
  2. The default speaker is set to Aria (ID: '9BWtsMINqrJLrRacOk9x').
  3. Speech-to-text functionality isn't supported by ElevenLabs.
  4. Available speakers can be retrieved using the getSpeakers() method, which returns detailed information about each voice including language and gender.