# voice.listen() The `listen()` method is a core function available in all Mastra voice providers that converts speech to text. It takes an audio stream as input and returns the transcribed text. ## Parameters **audioStream** (`NodeJS.ReadableStream`): Audio stream to transcribe. This can be a file stream or a microphone stream. **options** (`object`): Provider-specific options for speech recognition ## Return value Returns one of the following: - `Promise`: A promise that resolves to the transcribed text - `Promise`: A promise that resolves to a stream of transcribed text (for streaming transcription) - `Promise`: For real-time providers that emit 'writing' events instead of returning text directly ## Provider-specific options Each voice provider may support additional options specific to their implementation. Here are some examples: ### OpenAI **options** (`Options`): Configuration options. **options.filetype** (`string`): Audio file format (e.g., 'mp3', 'wav', 'm4a') **options.prompt** (`string`): Text to guide the model's transcription **options.language** (`string`): Language code (e.g., 'en', 'fr', 'de') ### Google **options** (`Options`): Configuration options. **options.stream** (`boolean`): Whether to use streaming recognition **options.config** (`object`): Recognition configuration from Google Cloud Speech-to-Text API ### Deepgram **options** (`Options`): Configuration options. **options.model** (`string`): Deepgram model to use for transcription **options.language** (`string`): Language code for transcription ## Usage example ```typescript import { OpenAIVoice } from '@mastra/voice-openai' import { getMicrophoneStream } from '@mastra/node-audio' import { createReadStream } from 'fs' import path from 'path' // Initialize a voice provider const voice = new OpenAIVoice({ listeningModel: { name: 'whisper-1', apiKey: process.env.OPENAI_API_KEY, }, }) // Basic usage with a file stream const audioFilePath = path.join(process.cwd(), 'audio.mp3') const audioStream = createReadStream(audioFilePath) const transcript = await voice.listen(audioStream, { filetype: 'mp3', }) console.log('Transcribed text:', transcript) // Using a microphone stream const microphoneStream = getMicrophoneStream() // Assume this function gets audio input const transcription = await voice.listen(microphoneStream) // With provider-specific options const transcriptWithOptions = await voice.listen(audioStream, { language: 'en', prompt: 'This is a conversation about artificial intelligence.', }) ``` ## Using with `CompositeVoice` When using `CompositeVoice`, the `listen()` method delegates to the configured listening provider: ```typescript import { CompositeVoice } from '@mastra/core/voice' import { OpenAIVoice } from '@mastra/voice-openai' import { PlayAIVoice } from '@mastra/voice-playai' const voice = new CompositeVoice({ input: new OpenAIVoice(), output: new PlayAIVoice(), }) // This will use the OpenAIVoice provider const transcript = await voice.listen(audioStream) ``` ### Using AI SDK Model Providers You can also use AI SDK transcription models directly with `CompositeVoice`: ```typescript import { CompositeVoice } from '@mastra/core/voice' import { openai } from '@ai-sdk/openai' import { groq } from '@ai-sdk/groq' // Use AI SDK transcription models const voice = new CompositeVoice({ input: openai.transcription('whisper-1'), // AI SDK model output: new PlayAIVoice(), // Mastra provider }) // Works the same way const transcript = await voice.listen(audioStream) // Provider-specific options can be passed through const transcriptWithOptions = await voice.listen(audioStream, { providerOptions: { openai: { language: 'en', prompt: 'This is about AI', }, }, }) ``` See the [CompositeVoice reference](https://mastra.ai/reference/voice/composite-voice) for more details on AI SDK integration. ## Realtime voice providers When using realtime voice providers like `OpenAIRealtimeVoice`, the `listen()` method behaves differently: - Instead of returning transcribed text, it emits 'writing' events with the transcribed text - You need to register an event listener to receive the transcription ```typescript import { OpenAIRealtimeVoice } from '@mastra/voice-openai-realtime' import { getMicrophoneStream } from '@mastra/node-audio' const voice = new OpenAIRealtimeVoice() await voice.connect() // Register event listener for transcription voice.on('writing', ({ text, role }) => { console.log(`${role}: ${text}`) }) // This will emit 'writing' events instead of returning text const microphoneStream = getMicrophoneStream() await voice.listen(microphoneStream) ``` ## Notes - Not all voice providers support speech-to-text functionality (e.g., PlayAI, Speechify) - The behavior of `listen()` may vary slightly between providers, but all implementations follow the same basic interface - When using a realtime voice provider, the method might not return text directly but instead emit a 'writing' event - The audio format supported depends on the provider. Common formats include MP3, WAV, and M4A - Some providers support streaming transcription, where text is returned as it's transcribed - For best performance, consider closing or ending the audio stream when you're done with it ## Related methods - [voice.speak()](https://mastra.ai/reference/voice/voice.speak) - Converts text to speech - [voice.send()](https://mastra.ai/reference/voice/voice.send) - Sends audio data to the voice provider in real-time - [voice.on()](https://mastra.ai/reference/voice/voice.on) - Registers an event listener for voice events