# MastraVoice

The MastraVoice class is an abstract base class that defines the core interface for voice services in Mastra. All voice provider implementations (like OpenAI, Deepgram, PlayAI, Speechify) extend this class to provide their specific functionality. The class now includes support for real-time speech-to-speech capabilities through WebSocket connections.

## Usage Example

```typescript
import { MastraVoice } from "@mastra/core/voice";

// Create a voice provider implementation
class MyVoiceProvider extends MastraVoice {
  constructor(config: {
    speechModel?: BuiltInModelConfig;
    listeningModel?: BuiltInModelConfig;
    speaker?: string;
    realtimeConfig?: {
      model?: string;
      apiKey?: string;
      options?: unknown;
    };
  }) {
    super({
      speechModel: config.speechModel,
      listeningModel: config.listeningModel,
      speaker: config.speaker,
      realtimeConfig: config.realtimeConfig,
    });
  }

  // Implement required abstract methods
  async speak(
    input: string | NodeJS.ReadableStream,
    options?: { speaker?: string },
  ): Promise<NodeJS.ReadableStream | void> {
    // Implement text-to-speech conversion
  }

  async listen(
    audioStream: NodeJS.ReadableStream,
    options?: unknown,
  ): Promise<string | NodeJS.ReadableStream | void> {
    // Implement speech-to-text conversion
  }

  async getSpeakers(): Promise<
    Array<{ voiceId: string; [key: string]: unknown }>
  > {
    // Return list of available voices
  }

  // Optional speech-to-speech methods
  async connect(): Promise<void> {
    // Establish WebSocket connection for speech-to-speech communication
  }

  async send(audioData: NodeJS.ReadableStream | Int16Array): Promise<void> {
    // Stream audio data in speech-to-speech
  }

  async answer(): Promise<void> {
    // Trigger voice provider to respond
  }

  addTools(tools: Array<unknown>): void {
    // Add tools for the voice provider to use
  }

  close(): void {
    // Close WebSocket connection
  }

  on(event: string, callback: (data: unknown) => void): void {
    // Register event listener
  }

  off(event: string, callback: (data: unknown) => void): void {
    // Remove event listener
  }
}
```

## Constructor Parameters

**config?:** (`VoiceConfig`): Configuration object for the voice service

**config.speechModel?:** (`BuiltInModelConfig`): Configuration for the text-to-speech model

**config.listeningModel?:** (`BuiltInModelConfig`): Configuration for the speech-to-text model

**config.speaker?:** (`string`): Default speaker/voice ID to use

**config.name?:** (`string`): Name for the voice provider instance

**config.realtimeConfig?:** (`object`): Configuration for real-time speech-to-speech capabilities

### BuiltInModelConfig

**name:** (`string`): Name of the model to use

**apiKey?:** (`string`): API key for the model service

### RealtimeConfig

**model?:** (`string`): Model to use for real-time speech-to-speech capabilities

**apiKey?:** (`string`): API key for the real-time service

**options?:** (`unknown`): Provider-specific options for real-time capabilities

## Abstract Methods

These methods must be implemented by unknown class extending MastraVoice.

### speak()

Converts text to speech using the configured speech model.

```typescript
abstract speak(
  input: string | NodeJS.ReadableStream,
  options?: {
    speaker?: string;
    [key: string]: unknown;
  }
): Promise<NodeJS.ReadableStream | void>
```

Purpose:

- Takes text input and converts it to speech using the provider's text-to-speech service
- Supports both string and stream input for flexibility
- Allows overriding the default speaker/voice through options
- Returns a stream of audio data that can be played or saved
- May return void if the audio is handled by emitting 'speaking' event

### listen()

Converts speech to text using the configured listening model.

```typescript
abstract listen(
  audioStream: NodeJS.ReadableStream,
  options?: {
    [key: string]: unknown;
  }
): Promise<string | NodeJS.ReadableStream | void>
```

Purpose:

- Takes an audio stream and converts it to text using the provider's speech-to-text service
- Supports provider-specific options for transcription configuration
- Can return either a complete text transcription or a stream of transcribed text
- Not all providers support this functionality (e.g., PlayAI, Speechify)
- May return void if the transcription is handled by emitting 'writing' event

### getSpeakers()

Returns a list of available voices supported by the provider.

```typescript
abstract getSpeakers(): Promise<Array<{ voiceId: string; [key: string]: unknown }>>
```

Purpose:

- Retrieves the list of available voices/speakers from the provider
- Each voice must have at least a voiceId property
- Providers can include additional metadata about each voice
- Used to discover available voices for text-to-speech conversion

## Optional Methods

These methods have default implementations but can be overridden by voice providers that support speech-to-speech capabilities.

### connect()

Establishes a WebSocket or WebRTC connection for communication.

```typescript
connect(config?: unknown): Promise<void>
```

Purpose:

- Initializes a connection to the voice service for communication
- Must be called before using features like send() or answer()
- Returns a Promise that resolves when the connection is established
- Configuration is provider-specific

### send()

Streams audio data in real-time to the voice provider.

```typescript
send(audioData: NodeJS.ReadableStream | Int16Array): Promise<void>
```

Purpose:

- Sends audio data to the voice provider for real-time processing
- Useful for continuous audio streaming scenarios like live microphone input
- Supports both ReadableStream and Int16Array audio formats
- Must be in connected state before calling this method

### answer()

Triggers the voice provider to generate a response.

```typescript
answer(): Promise<void>
```

Purpose:

- Sends a signal to the voice provider to generate a response
- Used in real-time conversations to prompt the AI to respond
- Response will be emitted through the event system (e.g., 'speaking' event)

### addTools()

Equips the voice provider with tools that can be used during conversations.

```typescript
addTools(tools: Array<Tool>): void
```

Purpose:

- Adds tools that the voice provider can use during conversations
- Tools can extend the capabilities of the voice provider
- Implementation is provider-specific

### close()

Disconnects from the WebSocket or WebRTC connection.

```typescript
close(): void
```

Purpose:

- Closes the connection to the voice service
- Cleans up resources and stops any ongoing real-time processing
- Should be called when you're done with the voice instance

### on()

Registers an event listener for voice events.

```typescript
on<E extends VoiceEventType>(
  event: E,
  callback: (data: E extends keyof VoiceEventMap ? VoiceEventMap[E] : unknown) => void,
): void
```

Purpose:

- Registers a callback function to be called when the specified event occurs
- Standard events include 'speaking', 'writing', and 'error'
- Providers can emit custom events as well
- Event data structure depends on the event type

### off()

Removes an event listener.

```typescript
off<E extends VoiceEventType>(
  event: E,
  callback: (data: E extends keyof VoiceEventMap ? VoiceEventMap[E] : unknown) => void,
): void
```

Purpose:

- Removes a previously registered event listener
- Used to clean up event handlers when they're no longer needed

## Event System

The MastraVoice class includes an event system for real-time communication. Standard event types include:

**speaking:** (`{ text: string; audioStream?: NodeJS.ReadableStream; audio?: Int16Array }`): Emitted when the voice provider is speaking, contains audio data

**writing:** (`{ text: string, role: string }`): Emitted when text is transcribed from speech

**error:** (`{ message: string; code?: string; details?: unknown }`): Emitted when an error occurs

## Protected Properties

**listeningModel?:** (`BuiltInModelConfig | undefined`): Configuration for the speech-to-text model

**speechModel?:** (`BuiltInModelConfig | undefined`): Configuration for the text-to-speech model

**speaker?:** (`string | undefined`): Default speaker/voice ID

**realtimeConfig?:** (`{ model?: string; apiKey?: string; options?: unknown } | undefined`): Configuration for real-time speech-to-speech capabilities

## Telemetry Support

MastraVoice includes built-in telemetry support through the `traced` method, which wraps method calls with performance tracking and error monitoring.

## Notes

- MastraVoice is an abstract class and cannot be instantiated directly
- Implementations must provide concrete implementations for all abstract methods
- The class provides a consistent interface across different voice service providers
- Speech-to-speech capabilities are optional and provider-specific
- The event system enables asynchronous communication for real-time interactions
- Telemetry is automatically handled for all method calls