MastraVoice

The MastraVoice class is an abstract base class that defines the core interface for voice services in Mastra. All voice provider implementations (like OpenAI, Deepgram, PlayAI, Speechify) extend this class to provide their specific functionality. The class now includes support for real-time speech-to-speech capabilities through WebSocket connections.

Usage Example


import { MastraVoice } from "@mastra/core/voice";
 
// Create a voice provider implementation
class MyVoiceProvider extends MastraVoice {
  constructor(config: {
    speechModel?: BuiltInModelConfig;
    listeningModel?: BuiltInModelConfig;
    speaker?: string;
    realtimeConfig?: {
      model?: string;
      apiKey?: string;
      options?: unknown;
    };
  }) {
    super({
      speechModel: config.speechModel,
      listeningModel: config.listeningModel,
      speaker: config.speaker,
      realtimeConfig: config.realtimeConfig,
    });
  }
 
  // Implement required abstract methods
  async speak(
    input: string | NodeJS.ReadableStream,
    options?: { speaker?: string },
  ): Promise<NodeJS.ReadableStream | void> {
    // Implement text-to-speech conversion
  }
 
  async listen(
    audioStream: NodeJS.ReadableStream,
    options?: unknown,
  ): Promise<string | NodeJS.ReadableStream | void> {
    // Implement speech-to-text conversion
  }
 
  async getSpeakers(): Promise<
    Array<{ voiceId: string; [key: string]: unknown }>
  > {
    // Return list of available voices
  }
 
  // Optional speech-to-speech methods
  async connect(): Promise<void> {
    // Establish WebSocket connection for speech-to-speech communication
  }
 
  async send(audioData: NodeJS.ReadableStream | Int16Array): Promise<void> {
    // Stream audio data in speech-to-speech
  }
 
  async answer(): Promise<void> {
    // Trigger voice provider to respond
  }
 
  addTools(tools: Array<unknown>): void {
    // Add tools for the voice provider to use
  }
 
  close(): void {
    // Close WebSocket connection
  }
 
  on(event: string, callback: (data: unknown) => void): void {
    // Register event listener
  }
 
  off(event: string, callback: (data: unknown) => void): void {
    // Remove event listener
  }
}

Constructor Parameters

config?:

VoiceConfig

Configuration object for the voice service

config.speechModel?:

BuiltInModelConfig

Configuration for the text-to-speech model

config.listeningModel?:

BuiltInModelConfig

Configuration for the speech-to-text model

config.speaker?:

string

Default speaker/voice ID to use

config.name?:

string

Name for the voice provider instance

config.realtimeConfig?:

object

Configuration for real-time speech-to-speech capabilities

BuiltInModelConfig

name:

string

Name of the model to use

apiKey?:

string

API key for the model service

RealtimeConfig

model?:

string

Model to use for real-time speech-to-speech capabilities

apiKey?:

string

API key for the real-time service

options?:

unknown

Provider-specific options for real-time capabilities

Abstract Methods

These methods must be implemented by unknown class extending MastraVoice.

speak()

Converts text to speech using the configured speech model.


abstract speak(
  input: string | NodeJS.ReadableStream,
  options?: {
    speaker?: string;
    [key: string]: unknown;
  }
): Promise<NodeJS.ReadableStream | void>

Purpose:

Takes text input and converts it to speech using the provider’s text-to-speech service
Supports both string and stream input for flexibility
Allows overriding the default speaker/voice through options
Returns a stream of audio data that can be played or saved
May return void if the audio is handled by emitting ‘speaking’ event

listen()

Converts speech to text using the configured listening model.


abstract listen(
  audioStream: NodeJS.ReadableStream,
  options?: {
    [key: string]: unknown;
  }
): Promise<string | NodeJS.ReadableStream | void>

Purpose:

Takes an audio stream and converts it to text using the provider’s speech-to-text service
Supports provider-specific options for transcription configuration
Can return either a complete text transcription or a stream of transcribed text
Not all providers support this functionality (e.g., PlayAI, Speechify)
May return void if the transcription is handled by emitting ‘writing’ event

getSpeakers()

Returns a list of available voices supported by the provider.


abstract getSpeakers(): Promise<Array<{ voiceId: string; [key: string]: unknown }>>

Purpose:

Retrieves the list of available voices/speakers from the provider
Each voice must have at least a voiceId property
Providers can include additional metadata about each voice
Used to discover available voices for text-to-speech conversion

Optional Methods

These methods have default implementations but can be overridden by voice providers that support speech-to-speech capabilities.

connect()

Establishes a WebSocket or WebRTC connection for communication.


connect(config?: unknown): Promise<void>

Purpose:

Initializes a connection to the voice service for communication
Must be called before using features like send() or answer()
Returns a Promise that resolves when the connection is established
Configuration is provider-specific

send()

Streams audio data in real-time to the voice provider.


send(audioData: NodeJS.ReadableStream | Int16Array): Promise<void>

Purpose:

Sends audio data to the voice provider for real-time processing
Useful for continuous audio streaming scenarios like live microphone input
Supports both ReadableStream and Int16Array audio formats
Must be in connected state before calling this method

answer()

Triggers the voice provider to generate a response.


answer(): Promise<void>

Purpose:

Sends a signal to the voice provider to generate a response
Used in real-time conversations to prompt the AI to respond
Response will be emitted through the event system (e.g., ‘speaking’ event)

addTools()

Equips the voice provider with tools that can be used during conversations.


addTools(tools: Array<Tool>): void

Purpose:

Adds tools that the voice provider can use during conversations
Tools can extend the capabilities of the voice provider
Implementation is provider-specific

close()

Disconnects from the WebSocket or WebRTC connection.


close(): void

Purpose:

Closes the connection to the voice service
Cleans up resources and stops any ongoing real-time processing
Should be called when you’re done with the voice instance

on()

Registers an event listener for voice events.


on<E extends VoiceEventType>(
  event: E,
  callback: (data: E extends keyof VoiceEventMap ? VoiceEventMap[E] : unknown) => void,
): void

Purpose:

Registers a callback function to be called when the specified event occurs
Standard events include ‘speaking’, ‘writing’, and ‘error’
Providers can emit custom events as well
Event data structure depends on the event type

off()

Removes an event listener.


off<E extends VoiceEventType>(
  event: E,
  callback: (data: E extends keyof VoiceEventMap ? VoiceEventMap[E] : unknown) => void,
): void

Purpose:

Removes a previously registered event listener
Used to clean up event handlers when they’re no longer needed

Event System

The MastraVoice class includes an event system for real-time communication. Standard event types include:

speaking:

{ text: string; audioStream?: NodeJS.ReadableStream; audio?: Int16Array }

Emitted when the voice provider is speaking, contains audio data

writing:

{ text: string, role: string }

Emitted when text is transcribed from speech

error:

{ message: string; code?: string; details?: unknown }

Emitted when an error occurs

Protected Properties

listeningModel?:

BuiltInModelConfig | undefined

Configuration for the speech-to-text model

speechModel?:

BuiltInModelConfig | undefined

Configuration for the text-to-speech model

speaker?:

string | undefined

Default speaker/voice ID

realtimeConfig?:

{ model?: string; apiKey?: string; options?: unknown } | undefined

Configuration for real-time speech-to-speech capabilities

Telemetry Support

MastraVoice includes built-in telemetry support through the traced method, which wraps method calls with performance tracking and error monitoring.

Notes

MastraVoice is an abstract class and cannot be instantiated directly
Implementations must provide concrete implementations for all abstract methods
The class provides a consistent interface across different voice service providers
Speech-to-speech capabilities are optional and provider-specific
The event system enables asynchronous communication for real-time interactions
Telemetry is automatically handled for all method calls

MastraVoice

Usage Example


import { MastraVoice } from "@mastra/core/voice";
 
// Create a voice provider implementation
class MyVoiceProvider extends MastraVoice {
  constructor(config: {
    speechModel?: BuiltInModelConfig;
    listeningModel?: BuiltInModelConfig;
    speaker?: string;
    realtimeConfig?: {
      model?: string;
      apiKey?: string;
      options?: unknown;
    };
  }) {
    super({
      speechModel: config.speechModel,
      listeningModel: config.listeningModel,
      speaker: config.speaker,
      realtimeConfig: config.realtimeConfig,
    });
  }
 
  // Implement required abstract methods
  async speak(
    input: string | NodeJS.ReadableStream,
    options?: { speaker?: string },
  ): Promise<NodeJS.ReadableStream | void> {
    // Implement text-to-speech conversion
  }
 
  async listen(
    audioStream: NodeJS.ReadableStream,
    options?: unknown,
  ): Promise<string | NodeJS.ReadableStream | void> {
    // Implement speech-to-text conversion
  }
 
  async getSpeakers(): Promise<
    Array<{ voiceId: string; [key: string]: unknown }>
  > {
    // Return list of available voices
  }
 
  // Optional speech-to-speech methods
  async connect(): Promise<void> {
    // Establish WebSocket connection for speech-to-speech communication
  }
 
  async send(audioData: NodeJS.ReadableStream | Int16Array): Promise<void> {
    // Stream audio data in speech-to-speech
  }
 
  async answer(): Promise<void> {
    // Trigger voice provider to respond
  }
 
  addTools(tools: Array<unknown>): void {
    // Add tools for the voice provider to use
  }
 
  close(): void {
    // Close WebSocket connection
  }
 
  on(event: string, callback: (data: unknown) => void): void {
    // Register event listener
  }
 
  off(event: string, callback: (data: unknown) => void): void {
    // Remove event listener
  }
}

Constructor Parameters

config?:

VoiceConfig

Configuration object for the voice service

config.speechModel?:

BuiltInModelConfig

Configuration for the text-to-speech model

config.listeningModel?:

BuiltInModelConfig

Configuration for the speech-to-text model

config.speaker?:

string

Default speaker/voice ID to use

config.name?:

string

Name for the voice provider instance

config.realtimeConfig?:

object

Configuration for real-time speech-to-speech capabilities

BuiltInModelConfig

name:

string

Name of the model to use

apiKey?:

string

API key for the model service

RealtimeConfig

model?:

string

Model to use for real-time speech-to-speech capabilities

apiKey?:

string

API key for the real-time service

options?:

unknown

Provider-specific options for real-time capabilities

Abstract Methods

These methods must be implemented by unknown class extending MastraVoice.

speak()

Converts text to speech using the configured speech model.


abstract speak(
  input: string | NodeJS.ReadableStream,
  options?: {
    speaker?: string;
    [key: string]: unknown;
  }
): Promise<NodeJS.ReadableStream | void>

Purpose:

Takes text input and converts it to speech using the provider’s text-to-speech service
Supports both string and stream input for flexibility
Allows overriding the default speaker/voice through options
Returns a stream of audio data that can be played or saved
May return void if the audio is handled by emitting ‘speaking’ event

listen()

Converts speech to text using the configured listening model.


abstract listen(
  audioStream: NodeJS.ReadableStream,
  options?: {
    [key: string]: unknown;
  }
): Promise<string | NodeJS.ReadableStream | void>

Purpose:

Takes an audio stream and converts it to text using the provider’s speech-to-text service
Supports provider-specific options for transcription configuration
Can return either a complete text transcription or a stream of transcribed text
Not all providers support this functionality (e.g., PlayAI, Speechify)
May return void if the transcription is handled by emitting ‘writing’ event

getSpeakers()

Returns a list of available voices supported by the provider.


abstract getSpeakers(): Promise<Array<{ voiceId: string; [key: string]: unknown }>>

Purpose:

Retrieves the list of available voices/speakers from the provider
Each voice must have at least a voiceId property
Providers can include additional metadata about each voice
Used to discover available voices for text-to-speech conversion

Optional Methods

These methods have default implementations but can be overridden by voice providers that support speech-to-speech capabilities.

connect()

Establishes a WebSocket or WebRTC connection for communication.


connect(config?: unknown): Promise<void>

Purpose:

Initializes a connection to the voice service for communication
Must be called before using features like send() or answer()
Returns a Promise that resolves when the connection is established
Configuration is provider-specific

send()

Streams audio data in real-time to the voice provider.


send(audioData: NodeJS.ReadableStream | Int16Array): Promise<void>

Purpose:

Sends audio data to the voice provider for real-time processing
Useful for continuous audio streaming scenarios like live microphone input
Supports both ReadableStream and Int16Array audio formats
Must be in connected state before calling this method

answer()

Triggers the voice provider to generate a response.


answer(): Promise<void>

Purpose:

Sends a signal to the voice provider to generate a response
Used in real-time conversations to prompt the AI to respond
Response will be emitted through the event system (e.g., ‘speaking’ event)

addTools()

Equips the voice provider with tools that can be used during conversations.


addTools(tools: Array<Tool>): void

Purpose:

Adds tools that the voice provider can use during conversations
Tools can extend the capabilities of the voice provider
Implementation is provider-specific

close()

Disconnects from the WebSocket or WebRTC connection.


close(): void

Purpose:

Closes the connection to the voice service
Cleans up resources and stops any ongoing real-time processing
Should be called when you’re done with the voice instance

on()

Registers an event listener for voice events.


on<E extends VoiceEventType>(
  event: E,
  callback: (data: E extends keyof VoiceEventMap ? VoiceEventMap[E] : unknown) => void,
): void

Purpose:

Registers a callback function to be called when the specified event occurs
Standard events include ‘speaking’, ‘writing’, and ‘error’
Providers can emit custom events as well
Event data structure depends on the event type

off()

Removes an event listener.


off<E extends VoiceEventType>(
  event: E,
  callback: (data: E extends keyof VoiceEventMap ? VoiceEventMap[E] : unknown) => void,
): void

Purpose:

Removes a previously registered event listener
Used to clean up event handlers when they’re no longer needed

Event System

The MastraVoice class includes an event system for real-time communication. Standard event types include:

speaking:

{ text: string; audioStream?: NodeJS.ReadableStream; audio?: Int16Array }

Emitted when the voice provider is speaking, contains audio data

writing:

{ text: string, role: string }

Emitted when text is transcribed from speech

error:

{ message: string; code?: string; details?: unknown }

Emitted when an error occurs

Protected Properties

listeningModel?:

BuiltInModelConfig | undefined

Configuration for the speech-to-text model

speechModel?:

BuiltInModelConfig | undefined

Configuration for the text-to-speech model

speaker?:

string | undefined

Default speaker/voice ID

realtimeConfig?:

{ model?: string; apiKey?: string; options?: unknown } | undefined

Configuration for real-time speech-to-speech capabilities

Telemetry Support

MastraVoice includes built-in telemetry support through the traced method, which wraps method calls with performance tracking and error monitoring.

Notes

MastraVoice is an abstract class and cannot be instantiated directly
Implementations must provide concrete implementations for all abstract methods
The class provides a consistent interface across different voice service providers
Speech-to-speech capabilities are optional and provider-specific
The event system enables asynchronous communication for real-time interactions
Telemetry is automatically handled for all method calls