voice.listen()
The listen() method is a core function available in all Mastra voice providers that converts speech to text. It takes an audio stream as input and returns the transcribed text.
ParametersDirect link to Parameters
audioStream:
NodeJS.ReadableStream
Audio stream to transcribe. This can be a file stream or a microphone stream.
options?:
object
Provider-specific options for speech recognition
Return ValueDirect link to Return Value
Returns one of the following:
Promise<string>: A promise that resolves to the transcribed textPromise<NodeJS.ReadableStream>: A promise that resolves to a stream of transcribed text (for streaming transcription)Promise<void>: For real-time providers that emit 'writing' events instead of returning text directly
Provider-Specific OptionsDirect link to Provider-Specific Options
Each voice provider may support additional options specific to their implementation. Here are some examples:
OpenAIDirect link to OpenAI
options.filetype?:
string
= 'mp3'
Audio file format (e.g., 'mp3', 'wav', 'm4a')
options.prompt?:
string
Text to guide the model's transcription
options.language?:
string
Language code (e.g., 'en', 'fr', 'de')
GoogleDirect link to Google
options.stream?:
boolean
= false
Whether to use streaming recognition
options.config?:
object
= { encoding: 'LINEAR16', languageCode: 'en-US' }
Recognition configuration from Google Cloud Speech-to-Text API
DeepgramDirect link to Deepgram
options.model?:
string
= 'nova-2'
Deepgram model to use for transcription
options.language?:
string
= 'en'
Language code for transcription
Usage ExampleDirect link to Usage Example
import { OpenAIVoice } from "@mastra/voice-openai";
import { getMicrophoneStream } from "@mastra/node-audio";
import { createReadStream } from "fs";
import path from "path";
// Initialize a voice provider
const voice = new OpenAIVoice({
listeningModel: {
name: "whisper-1",
apiKey: process.env.OPENAI_API_KEY,
},
});
// Basic usage with a file stream
const audioFilePath = path.join(process.cwd(), "audio.mp3");
const audioStream = createReadStream(audioFilePath);
const transcript = await voice.listen(audioStream, {
filetype: "mp3",
});
console.log("Transcribed text:", transcript);
// Using a microphone stream
const microphoneStream = getMicrophoneStream(); // Assume this function gets audio input
const transcription = await voice.listen(microphoneStream);
// With provider-specific options
const transcriptWithOptions = await voice.listen(audioStream, {
language: "en",
prompt: "This is a conversation about artificial intelligence.",
});
Using with CompositeVoiceDirect link to Using with CompositeVoice
When using CompositeVoice, the listen() method delegates to the configured listening provider:
import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";
const voice = new CompositeVoice({
input: new OpenAIVoice(),
output: new PlayAIVoice(),
});
// This will use the OpenAIVoice provider
const transcript = await voice.listen(audioStream);
Using AI SDK Model ProvidersDirect link to Using AI SDK Model Providers
You can also use AI SDK transcription models directly with CompositeVoice:
import { CompositeVoice } from "@mastra/core/voice";
import { openai } from "@ai-sdk/openai";
import { groq } from "@ai-sdk/groq";
// Use AI SDK transcription models
const voice = new CompositeVoice({
input: openai.transcription('whisper-1'), // AI SDK model
output: new PlayAIVoice(), // Mastra provider
});
// Works the same way
const transcript = await voice.listen(audioStream);
// Provider-specific options can be passed through
const transcriptWithOptions = await voice.listen(audioStream, {
providerOptions: {
openai: {
language: 'en',
prompt: 'This is about AI',
}
}
});
See the CompositeVoice reference for more details on AI SDK integration.
Realtime Voice ProvidersDirect link to Realtime Voice Providers
When using realtime voice providers like OpenAIRealtimeVoice, the listen() method behaves differently:
- Instead of returning transcribed text, it emits 'writing' events with the transcribed text
- You need to register an event listener to receive the transcription
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import { getMicrophoneStream } from "@mastra/node-audio";
const voice = new OpenAIRealtimeVoice();
await voice.connect();
// Register event listener for transcription
voice.on("writing", ({ text, role }) => {
console.log(`${role}: ${text}`);
});
// This will emit 'writing' events instead of returning text
const microphoneStream = getMicrophoneStream();
await voice.listen(microphoneStream);
NotesDirect link to Notes
- Not all voice providers support speech-to-text functionality (e.g., PlayAI, Speechify)
- The behavior of
listen()may vary slightly between providers, but all implementations follow the same basic interface - When using a realtime voice provider, the method might not return text directly but instead emit a 'writing' event
- The audio format supported depends on the provider. Common formats include MP3, WAV, and M4A
- Some providers support streaming transcription, where text is returned as it's transcribed
- For best performance, consider closing or ending the audio stream when you're done with it
Related MethodsDirect link to Related Methods
- voice.speak() - Converts text to speech
- voice.send() - Sends audio data to the voice provider in real-time
- voice.on() - Registers an event listener for voice events