voice.listen()
The listen()
method is a core function available in all Mastra voice providers that converts speech to text. It takes an audio stream as input and returns the transcribed text.
Usage Example
import { OpenAIVoice } from "@mastra/voice-openai";
import { createReadStream } from "fs";
import path from "path";
// Initialize a voice provider
const voice = new OpenAIVoice({
listeningModel: {
name: "whisper-1",
apiKey: process.env.OPENAI_API_KEY,
},
});
// Basic usage with a file stream
const audioFilePath = path.join(process.cwd(), "audio.mp3");
const audioStream = createReadStream(audioFilePath);
const transcript = await voice.listen(audioStream, {
filetype: "mp3",
});
console.log("Transcribed text:", transcript);
// Using a microphone stream
const microphoneStream = getMicrophoneStream(); // Assume this function gets audio input
const transcription = await voice.listen(microphoneStream);
// With provider-specific options
const transcriptWithOptions = await voice.listen(audioStream, {
language: "en",
prompt: "This is a conversation about artificial intelligence.",
});
Parameters
audioStream:
NodeJS.ReadableStream
Audio stream to transcribe. This can be a file stream or a microphone stream.
options?:
object
Provider-specific options for speech recognition
Return Value
Returns one of the following:
Promise<string>
: A promise that resolves to the transcribed textPromise<NodeJS.ReadableStream>
: A promise that resolves to a stream of transcribed text (for streaming transcription)Promise<void>
: For real-time providers that emit ‘writing’ events instead of returning text directly
Provider-Specific Options
Each voice provider may support additional options specific to their implementation. Here are some examples:
OpenAI
options.filetype?:
string
= 'mp3'
Audio file format (e.g., 'mp3', 'wav', 'm4a')
options.prompt?:
string
Text to guide the model's transcription
options.language?:
string
Language code (e.g., 'en', 'fr', 'de')
options.stream?:
boolean
= false
Whether to use streaming recognition
options.config?:
object
= { encoding: 'LINEAR16', languageCode: 'en-US' }
Recognition configuration from Google Cloud Speech-to-Text API
Deepgram
options.model?:
string
= 'nova-2'
Deepgram model to use for transcription
options.language?:
string
= 'en'
Language code for transcription
Realtime Voice Providers
When using realtime voice providers like OpenAIRealtimeVoice
, the listen()
method behaves differently:
- Instead of returning transcribed text, it emits ‘writing’ events with the transcribed text
- You need to register an event listener to receive the transcription
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
const voice = new OpenAIRealtimeVoice();
await voice.connect();
// Register event listener for transcription
voice.on("writing", ({ text, role }) => {
console.log(`${role}: ${text}`);
});
// This will emit 'writing' events instead of returning text
const microphoneStream = getMicrophoneStream();
await voice.listen(microphoneStream);
Using with CompositeVoice
When using CompositeVoice
, the listen()
method delegates to the configured listening provider:
import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";
const voice = new CompositeVoice({
listenProvider: new OpenAIVoice(),
speakProvider: new PlayAIVoice(),
});
// This will use the OpenAIVoice provider
const transcript = await voice.listen(audioStream);
Notes
- Not all voice providers support speech-to-text functionality (e.g., PlayAI, Speechify)
- The behavior of
listen()
may vary slightly between providers, but all implementations follow the same basic interface - When using a realtime voice provider, the method might not return text directly but instead emit a ‘writing’ event
- The audio format supported depends on the provider. Common formats include MP3, WAV, and M4A
- Some providers support streaming transcription, where text is returned as it’s transcribed
- For best performance, consider closing or ending the audio stream when you’re done with it
Related Methods
- voice.speak() - Converts text to speech
- voice.send() - Sends audio data to the voice provider in real-time
- voice.on() - Registers an event listener for voice events