# OpenAI

The OpenAIVoice class in Mastra provides text-to-speech and speech-to-text capabilities using OpenAI's models.

## Usage Example

```typescript
import { OpenAIVoice } from "@mastra/voice-openai";

// Initialize with default configuration using environment variables
const voice = new OpenAIVoice();

// Or initialize with specific configuration
const voiceWithConfig = new OpenAIVoice({
  speechModel: {
    name: "tts-1-hd",
    apiKey: "your-openai-api-key",
  },
  listeningModel: {
    name: "whisper-1",
    apiKey: "your-openai-api-key",
  },
  speaker: "alloy", // Default voice
});

// Convert text to speech
const audioStream = await voice.speak("Hello, how can I help you?", {
  speaker: "nova", // Override default voice
  speed: 1.2, // Adjust speech speed
});

// Convert speech to text
const text = await voice.listen(audioStream, {
  filetype: "mp3",
});
```

## Configuration

### Constructor Options

**speechModel?:** (`OpenAIConfig`): Configuration for text-to-speech synthesis. (Default: `{ name: 'tts-1' }`)

**listeningModel?:** (`OpenAIConfig`): Configuration for speech-to-text recognition. (Default: `{ name: 'whisper-1' }`)

**speaker?:** (`OpenAIVoiceId`): Default voice ID for speech synthesis. (Default: `'alloy'`)

### OpenAIConfig

**name?:** (`'tts-1' | 'tts-1-hd' | 'whisper-1'`): Model name. Use 'tts-1-hd' for higher quality audio.

**apiKey?:** (`string`): OpenAI API key. Falls back to OPENAI\_API\_KEY environment variable.

## Methods

### speak()

Converts text to speech using OpenAI's text-to-speech models.

**input:** (`string | NodeJS.ReadableStream`): Text or text stream to convert to speech.

**options.speaker?:** (`OpenAIVoiceId`): Voice ID to use for speech synthesis. (Default: `Constructor's speaker value`)

**options.speed?:** (`number`): Speech speed multiplier. (Default: `1.0`)

Returns: `Promise<NodeJS.ReadableStream>`

### listen()

Transcribes audio using OpenAI's Whisper model.

**audioStream:** (`NodeJS.ReadableStream`): Audio stream to transcribe.

**options.filetype?:** (`string`): Audio format of the input stream. (Default: `'mp3'`)

Returns: `Promise<string>`

### getSpeakers()

Returns an array of available voice options, where each node contains:

**voiceId:** (`string`): Unique identifier for the voice

## Notes

- API keys can be provided via constructor options or the `OPENAI_API_KEY` environment variable
- The `tts-1-hd` model provides higher quality audio but may have slower processing times
- Speech recognition supports multiple audio formats including mp3, wav, and webm