Skip to main content
Mastra 1.0 is available 🎉 Read announcement

voice.listen()

The listen() method is a core function available in all Mastra voice providers that converts speech to text. It takes an audio stream as input and returns the transcribed text.

Parameters
Direct link to Parameters

audioStream:

NodeJS.ReadableStream
Audio stream to transcribe. This can be a file stream or a microphone stream.

options?:

object
Provider-specific options for speech recognition

Return Value
Direct link to Return Value

Returns one of the following:

  • Promise<string>: A promise that resolves to the transcribed text
  • Promise<NodeJS.ReadableStream>: A promise that resolves to a stream of transcribed text (for streaming transcription)
  • Promise<void>: For real-time providers that emit 'writing' events instead of returning text directly

Provider-Specific Options
Direct link to Provider-Specific Options

Each voice provider may support additional options specific to their implementation. Here are some examples:

OpenAI
Direct link to OpenAI

options.filetype?:

string
= 'mp3'
Audio file format (e.g., 'mp3', 'wav', 'm4a')

options.prompt?:

string
Text to guide the model's transcription

options.language?:

string
Language code (e.g., 'en', 'fr', 'de')

Google
Direct link to Google

options.stream?:

boolean
= false
Whether to use streaming recognition

options.config?:

object
= { encoding: 'LINEAR16', languageCode: 'en-US' }
Recognition configuration from Google Cloud Speech-to-Text API

Deepgram
Direct link to Deepgram

options.model?:

string
= 'nova-2'
Deepgram model to use for transcription

options.language?:

string
= 'en'
Language code for transcription

Usage Example
Direct link to Usage Example

import { OpenAIVoice } from "@mastra/voice-openai";
import { getMicrophoneStream } from "@mastra/node-audio";
import { createReadStream } from "fs";
import path from "path";

// Initialize a voice provider
const voice = new OpenAIVoice({
listeningModel: {
name: "whisper-1",
apiKey: process.env.OPENAI_API_KEY,
},
});

// Basic usage with a file stream
const audioFilePath = path.join(process.cwd(), "audio.mp3");
const audioStream = createReadStream(audioFilePath);
const transcript = await voice.listen(audioStream, {
filetype: "mp3",
});
console.log("Transcribed text:", transcript);

// Using a microphone stream
const microphoneStream = getMicrophoneStream(); // Assume this function gets audio input
const transcription = await voice.listen(microphoneStream);

// With provider-specific options
const transcriptWithOptions = await voice.listen(audioStream, {
language: "en",
prompt: "This is a conversation about artificial intelligence.",
});

Using with CompositeVoice
Direct link to Using with CompositeVoice

When using CompositeVoice, the listen() method delegates to the configured listening provider:

import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";

const voice = new CompositeVoice({
input: new OpenAIVoice(),
output: new PlayAIVoice(),
});

// This will use the OpenAIVoice provider
const transcript = await voice.listen(audioStream);

Using AI SDK Model Providers
Direct link to Using AI SDK Model Providers

You can also use AI SDK transcription models directly with CompositeVoice:

import { CompositeVoice } from "@mastra/core/voice";
import { openai } from "@ai-sdk/openai";
import { groq } from "@ai-sdk/groq";

// Use AI SDK transcription models
const voice = new CompositeVoice({
input: openai.transcription('whisper-1'), // AI SDK model
output: new PlayAIVoice(), // Mastra provider
});

// Works the same way
const transcript = await voice.listen(audioStream);

// Provider-specific options can be passed through
const transcriptWithOptions = await voice.listen(audioStream, {
providerOptions: {
openai: {
language: 'en',
prompt: 'This is about AI',
}
}
});

See the CompositeVoice reference for more details on AI SDK integration.

Realtime Voice Providers
Direct link to Realtime Voice Providers

When using realtime voice providers like OpenAIRealtimeVoice, the listen() method behaves differently:

  • Instead of returning transcribed text, it emits 'writing' events with the transcribed text
  • You need to register an event listener to receive the transcription
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import { getMicrophoneStream } from "@mastra/node-audio";

const voice = new OpenAIRealtimeVoice();
await voice.connect();

// Register event listener for transcription
voice.on("writing", ({ text, role }) => {
console.log(`${role}: ${text}`);
});

// This will emit 'writing' events instead of returning text
const microphoneStream = getMicrophoneStream();
await voice.listen(microphoneStream);

Notes
Direct link to Notes

  • Not all voice providers support speech-to-text functionality (e.g., PlayAI, Speechify)
  • The behavior of listen() may vary slightly between providers, but all implementations follow the same basic interface
  • When using a realtime voice provider, the method might not return text directly but instead emit a 'writing' event
  • The audio format supported depends on the provider. Common formats include MP3, WAV, and M4A
  • Some providers support streaming transcription, where text is returned as it's transcribed
  • For best performance, consider closing or ending the audio stream when you're done with it