# voice.listen()

The `listen()` method is a core function available in all Mastra voice providers that converts speech to text. It takes an audio stream as input and returns the transcribed text.

## Parameters

**audioStream** (`NodeJS.ReadableStream`): Audio stream to transcribe. This can be a file stream or a microphone stream.

**options** (`object`): Provider-specific options for speech recognition

## Return value

Returns one of the following:

- `Promise<string>`: A promise that resolves to the transcribed text
- `Promise<NodeJS.ReadableStream>`: A promise that resolves to a stream of transcribed text (for streaming transcription)
- `Promise<void>`: For real-time providers that emit 'writing' events instead of returning text directly

## Provider-specific options

Each voice provider may support additional options specific to their implementation. Here are some examples:

### OpenAI

**options** (`Options`): Configuration options.

**options.filetype** (`string`): Audio file format (e.g., 'mp3', 'wav', 'm4a')

**options.prompt** (`string`): Text to guide the model's transcription

**options.language** (`string`): Language code (e.g., 'en', 'fr', 'de')

### Google

**options** (`Options`): Configuration options.

**options.stream** (`boolean`): Whether to use streaming recognition

**options.config** (`object`): Recognition configuration from Google Cloud Speech-to-Text API

### Deepgram

**options** (`Options`): Configuration options.

**options.model** (`string`): Deepgram model to use for transcription

**options.language** (`string`): Language code for transcription

## Usage example

```typescript
import { OpenAIVoice } from '@mastra/voice-openai'
import { getMicrophoneStream } from '@mastra/node-audio'
import { createReadStream } from 'fs'
import path from 'path'

// Initialize a voice provider
const voice = new OpenAIVoice({
  listeningModel: {
    name: 'whisper-1',
    apiKey: process.env.OPENAI_API_KEY,
  },
})

// Basic usage with a file stream
const audioFilePath = path.join(process.cwd(), 'audio.mp3')
const audioStream = createReadStream(audioFilePath)
const transcript = await voice.listen(audioStream, {
  filetype: 'mp3',
})
console.log('Transcribed text:', transcript)

// Using a microphone stream
const microphoneStream = getMicrophoneStream() // Assume this function gets audio input
const transcription = await voice.listen(microphoneStream)

// With provider-specific options
const transcriptWithOptions = await voice.listen(audioStream, {
  language: 'en',
  prompt: 'This is a conversation about artificial intelligence.',
})
```

## Using with `CompositeVoice`

When using `CompositeVoice`, the `listen()` method delegates to the configured listening provider:

```typescript
import { CompositeVoice } from '@mastra/core/voice'
import { OpenAIVoice } from '@mastra/voice-openai'
import { PlayAIVoice } from '@mastra/voice-playai'

const voice = new CompositeVoice({
  input: new OpenAIVoice(),
  output: new PlayAIVoice(),
})

// This will use the OpenAIVoice provider
const transcript = await voice.listen(audioStream)
```

### Using AI SDK Model Providers

You can also use AI SDK transcription models directly with `CompositeVoice`:

```typescript
import { CompositeVoice } from '@mastra/core/voice'
import { openai } from '@ai-sdk/openai'
import { groq } from '@ai-sdk/groq'

// Use AI SDK transcription models
const voice = new CompositeVoice({
  input: openai.transcription('whisper-1'), // AI SDK model
  output: new PlayAIVoice(), // Mastra provider
})

// Works the same way
const transcript = await voice.listen(audioStream)

// Provider-specific options can be passed through
const transcriptWithOptions = await voice.listen(audioStream, {
  providerOptions: {
    openai: {
      language: 'en',
      prompt: 'This is about AI',
    },
  },
})
```

See the [CompositeVoice reference](https://mastra.ai/reference/voice/composite-voice) for more details on AI SDK integration.

## Realtime voice providers

When using realtime voice providers like `OpenAIRealtimeVoice`, the `listen()` method behaves differently:

- Instead of returning transcribed text, it emits 'writing' events with the transcribed text
- You need to register an event listener to receive the transcription

```typescript
import { OpenAIRealtimeVoice } from '@mastra/voice-openai-realtime'
import { getMicrophoneStream } from '@mastra/node-audio'

const voice = new OpenAIRealtimeVoice()
await voice.connect()

// Register event listener for transcription
voice.on('writing', ({ text, role }) => {
  console.log(`${role}: ${text}`)
})

// This will emit 'writing' events instead of returning text
const microphoneStream = getMicrophoneStream()
await voice.listen(microphoneStream)
```

## Notes

- Not all voice providers support speech-to-text functionality (e.g., PlayAI, Speechify)
- The behavior of `listen()` may vary slightly between providers, but all implementations follow the same basic interface
- When using a realtime voice provider, the method might not return text directly but instead emit a 'writing' event
- The audio format supported depends on the provider. Common formats include MP3, WAV, and M4A
- Some providers support streaming transcription, where text is returned as it's transcribed
- For best performance, consider closing or ending the audio stream when you're done with it

## Related methods

- [voice.speak()](https://mastra.ai/reference/voice/voice.speak) - Converts text to speech
- [voice.send()](https://mastra.ai/reference/voice/voice.send) - Sends audio data to the voice provider in real-time
- [voice.on()](https://mastra.ai/reference/voice/voice.on) - Registers an event listener for voice events