voice.listen()

The listen() method is a core function available in all Mastra voice providers that converts speech to text. It takes an audio stream as input and returns the transcribed text.

ParametersDirect link to Parameters

audioStream:

NodeJS.ReadableStream

Audio stream to transcribe. This can be a file stream or a microphone stream.

options?:

object

Provider-specific options for speech recognition

Return ValueDirect link to Return Value

Returns one of the following:

Promise<string>: A promise that resolves to the transcribed text
Promise<NodeJS.ReadableStream>: A promise that resolves to a stream of transcribed text (for streaming transcription)
Promise<void>: For real-time providers that emit 'writing' events instead of returning text directly

Provider-Specific OptionsDirect link to Provider-Specific Options

Each voice provider may support additional options specific to their implementation. Here are some examples:

OpenAIDirect link to OpenAI

options.filetype?:

string

= 'mp3'

Audio file format (e.g., 'mp3', 'wav', 'm4a')

options.prompt?:

string

Text to guide the model's transcription

options.language?:

string

Language code (e.g., 'en', 'fr', 'de')

GoogleDirect link to Google

options.stream?:

boolean

= false

Whether to use streaming recognition

options.config?:

object

= { encoding: 'LINEAR16', languageCode: 'en-US' }

Recognition configuration from Google Cloud Speech-to-Text API

DeepgramDirect link to Deepgram

options.model?:

string

= 'nova-2'

Deepgram model to use for transcription

options.language?:

string

= 'en'

Language code for transcription

Usage ExampleDirect link to Usage Example

import { OpenAIVoice } from "@mastra/voice-openai";
import { getMicrophoneStream } from "@mastra/node-audio";
import { createReadStream } from "fs";
import path from "path";

// Initialize a voice provider
const voice = new OpenAIVoice({
  listeningModel: {
    name: "whisper-1",
    apiKey: process.env.OPENAI_API_KEY,
  },
});

// Basic usage with a file stream
const audioFilePath = path.join(process.cwd(), "audio.mp3");
const audioStream = createReadStream(audioFilePath);
const transcript = await voice.listen(audioStream, {
  filetype: "mp3",
});
console.log("Transcribed text:", transcript);

// Using a microphone stream
const microphoneStream = getMicrophoneStream(); // Assume this function gets audio input
const transcription = await voice.listen(microphoneStream);

// With provider-specific options
const transcriptWithOptions = await voice.listen(audioStream, {
  language: "en",
  prompt: "This is a conversation about artificial intelligence.",
});

Using with CompositeVoiceDirect link to Using with CompositeVoice

When using CompositeVoice, the listen() method delegates to the configured listening provider:

import { CompositeVoice } from "@mastra/core/voice";
import { OpenAIVoice } from "@mastra/voice-openai";
import { PlayAIVoice } from "@mastra/voice-playai";

const voice = new CompositeVoice({
  input: new OpenAIVoice(),
  output: new PlayAIVoice(),
});

// This will use the OpenAIVoice provider
const transcript = await voice.listen(audioStream);

Using AI SDK Model ProvidersDirect link to Using AI SDK Model Providers

You can also use AI SDK transcription models directly with CompositeVoice:

import { CompositeVoice } from "@mastra/core/voice";
import { openai } from "@ai-sdk/openai";
import { groq } from "@ai-sdk/groq";

// Use AI SDK transcription models
const voice = new CompositeVoice({
  input: openai.transcription('whisper-1'),  // AI SDK model
  output: new PlayAIVoice(),                 // Mastra provider
});

// Works the same way
const transcript = await voice.listen(audioStream);

// Provider-specific options can be passed through
const transcriptWithOptions = await voice.listen(audioStream, {
  providerOptions: {
    openai: {
      language: 'en',
      prompt: 'This is about AI',
    }
  }
});

See the CompositeVoice reference for more details on AI SDK integration.

Realtime Voice ProvidersDirect link to Realtime Voice Providers

When using realtime voice providers like OpenAIRealtimeVoice, the listen() method behaves differently:

Instead of returning transcribed text, it emits 'writing' events with the transcribed text
You need to register an event listener to receive the transcription

import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import { getMicrophoneStream } from "@mastra/node-audio";

const voice = new OpenAIRealtimeVoice();
await voice.connect();

// Register event listener for transcription
voice.on("writing", ({ text, role }) => {
  console.log(`${role}: ${text}`);
});

// This will emit 'writing' events instead of returning text
const microphoneStream = getMicrophoneStream();
await voice.listen(microphoneStream);

NotesDirect link to Notes

Not all voice providers support speech-to-text functionality (e.g., PlayAI, Speechify)
The behavior of listen() may vary slightly between providers, but all implementations follow the same basic interface
When using a realtime voice provider, the method might not return text directly but instead emit a 'writing' event
The audio format supported depends on the provider. Common formats include MP3, WAV, and M4A
Some providers support streaming transcription, where text is returned as it's transcribed
For best performance, consider closing or ending the audio stream when you're done with it

voice.speak() - Converts text to speech
voice.send() - Sends audio data to the voice provider in real-time
voice.on() - Registers an event listener for voice events

ParametersDirect link to Parameters

audioStream:

options?:

Return ValueDirect link to Return Value

Provider-Specific OptionsDirect link to Provider-Specific Options

OpenAIDirect link to OpenAI

options.filetype?:

options.prompt?:

options.language?:

GoogleDirect link to Google

options.stream?:

options.config?:

DeepgramDirect link to Deepgram

options.model?:

options.language?:

Usage ExampleDirect link to Usage Example

Using with CompositeVoiceDirect link to Using with CompositeVoice

Using AI SDK Model ProvidersDirect link to Using AI SDK Model Providers

Realtime Voice ProvidersDirect link to Realtime Voice Providers

NotesDirect link to Notes

Related MethodsDirect link to Related Methods