MastraVoice
The MastraVoice class is an abstract base class that defines the core interface for voice services in Mastra. All voice provider implementations (like OpenAI, Deepgram, PlayAI, Speechify) extend this class to provide their specific functionality. The class now includes support for real-time speech-to-speech capabilities through WebSocket connections.
Usage Example
import { MastraVoice } from "@mastra/core/voice";
// Create a voice provider implementation
class MyVoiceProvider extends MastraVoice {
constructor(config: {
speechModel?: BuiltInModelConfig;
listeningModel?: BuiltInModelConfig;
speaker?: string;
realtimeConfig?: {
model?: string;
apiKey?: string;
options?: unknown;
};
}) {
super({
speechModel: config.speechModel,
listeningModel: config.listeningModel,
speaker: config.speaker,
realtimeConfig: config.realtimeConfig
});
}
// Implement required abstract methods
async speak(input: string | NodeJS.ReadableStream, options?: { speaker?: string }): Promise<NodeJS.ReadableStream | void> {
// Implement text-to-speech conversion
}
async listen(audioStream: NodeJS.ReadableStream, options?: unknown): Promise<string | NodeJS.ReadableStream | void> {
// Implement speech-to-text conversion
}
async getSpeakers(): Promise<Array<{ voiceId: string; [key: string]: unknown }>> {
// Return list of available voices
}
// Optional speech-to-speech methods
async connect(): Promise<void> {
// Establish WebSocket connection for speech-to-speech communication
}
async send(audioData: NodeJS.ReadableStream | Int16Array): Promise<void> {
// Stream audio data in speech-to-speech
}
async answer(): Promise<void> {
// Trigger voice provider to respond
}
addTools(tools: Array<unknown>): void {
// Add tools for the voice provider to use
}
close(): void {
// Close WebSocket connection
}
on(event: string, callback: (data: unknown) => void): void {
// Register event listener
}
off(event: string, callback: (data: unknown) => void): void {
// Remove event listener
}
}
Constructor Parameters
config?:
config.speechModel?:
config.listeningModel?:
config.speaker?:
config.name?:
config.realtimeConfig?:
BuiltInModelConfig
name:
apiKey?:
RealtimeConfig
model?:
apiKey?:
options?:
Abstract Methods
These methods must be implemented by unknown class extending MastraVoice.
speak()
Converts text to speech using the configured speech model.
abstract speak(
input: string | NodeJS.ReadableStream,
options?: {
speaker?: string;
[key: string]: unknown;
}
): Promise<NodeJS.ReadableStream | void>
Purpose:
- Takes text input and converts it to speech using the provider’s text-to-speech service
- Supports both string and stream input for flexibility
- Allows overriding the default speaker/voice through options
- Returns a stream of audio data that can be played or saved
- May return void if the audio is handled by emitting ‘speaking’ event
listen()
Converts speech to text using the configured listening model.
abstract listen(
audioStream: NodeJS.ReadableStream,
options?: {
[key: string]: unknown;
}
): Promise<string | NodeJS.ReadableStream | void>
Purpose:
- Takes an audio stream and converts it to text using the provider’s speech-to-text service
- Supports provider-specific options for transcription configuration
- Can return either a complete text transcription or a stream of transcribed text
- Not all providers support this functionality (e.g., PlayAI, Speechify)
- May return void if the transcription is handled by emitting ‘writing’ event
getSpeakers()
Returns a list of available voices supported by the provider.
abstract getSpeakers(): Promise<Array<{ voiceId: string; [key: string]: unknown }>>
Purpose:
- Retrieves the list of available voices/speakers from the provider
- Each voice must have at least a voiceId property
- Providers can include additional metadata about each voice
- Used to discover available voices for text-to-speech conversion
Optional Methods
These methods have default implementations but can be overridden by voice providers that support speech-to-speech capabilities.
connect()
Establishes a WebSocket or WebRTC connection for communication.
connect(config?: unknown): Promise<void>
Purpose:
- Initializes a connection to the voice service for communication
- Must be called before using features like send() or answer()
- Returns a Promise that resolves when the connection is established
- Configuration is provider-specific
send()
Streams audio data in real-time to the voice provider.
send(audioData: NodeJS.ReadableStream | Int16Array): Promise<void>
Purpose:
- Sends audio data to the voice provider for real-time processing
- Useful for continuous audio streaming scenarios like live microphone input
- Supports both ReadableStream and Int16Array audio formats
- Must be in connected state before calling this method
answer()
Triggers the voice provider to generate a response.
answer(): Promise<void>
Purpose:
- Sends a signal to the voice provider to generate a response
- Used in real-time conversations to prompt the AI to respond
- Response will be emitted through the event system (e.g., ‘speaking’ event)
addTools()
Equips the voice provider with tools that can be used during conversations.
addTools(tools: Array<Tool>): void
Purpose:
- Adds tools that the voice provider can use during conversations
- Tools can extend the capabilities of the voice provider
- Implementation is provider-specific
close()
Disconnects from the WebSocket or WebRTC connection.
close(): void
Purpose:
- Closes the connection to the voice service
- Cleans up resources and stops any ongoing real-time processing
- Should be called when you’re done with the voice instance
on()
Registers an event listener for voice events.
on<E extends VoiceEventType>(
event: E,
callback: (data: E extends keyof VoiceEventMap ? VoiceEventMap[E] : unknown) => void,
): void
Purpose:
- Registers a callback function to be called when the specified event occurs
- Standard events include ‘speaking’, ‘writing’, and ‘error’
- Providers can emit custom events as well
- Event data structure depends on the event type
off()
Removes an event listener.
off<E extends VoiceEventType>(
event: E,
callback: (data: E extends keyof VoiceEventMap ? VoiceEventMap[E] : unknown) => void,
): void
Purpose:
- Removes a previously registered event listener
- Used to clean up event handlers when they’re no longer needed
Event System
The MastraVoice class includes an event system for real-time communication. Standard event types include:
speaking:
writing:
error:
Protected Properties
listeningModel?:
speechModel?:
speaker?:
realtimeConfig?:
Telemetry Support
MastraVoice includes built-in telemetry support through the traced
method, which wraps method calls with performance tracking and error monitoring.
Notes
- MastraVoice is an abstract class and cannot be instantiated directly
- Implementations must provide concrete implementations for all abstract methods
- The class provides a consistent interface across different voice service providers
- Speech-to-speech capabilities are optional and provider-specific
- The event system enables asynchronous communication for real-time interactions
- Telemetry is automatically handled for all method calls