voice.send()
The send()
method streams audio data in real-time to voice providers for continuous processing. This method is essential for real-time speech-to-speech conversations, allowing you to send microphone input directly to the AI service.
Usage Example
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import Speaker from "@mastra/node-speaker";
import { getMicrophoneStream } from "@mastra/node-audio";
const speaker = new Speaker({
sampleRate: 24100, // Audio sample rate in Hz - standard for high-quality audio on MacBook Pro
channels: 1, // Mono audio output (as opposed to stereo which would be 2)
bitDepth: 16, // Bit depth for audio quality - CD quality standard (16-bit resolution)
});
// Initialize a real-time voice provider
const voice = new OpenAIRealtimeVoice({
realtimeConfig: {
model: "gpt-4o-mini-realtime",
apiKey: process.env.OPENAI_API_KEY,
},
});
// Connect to the real-time service
await voice.connect();
// Set up event listeners for responses
voice.on("writing", ({ text, role }) => {
console.log(`${role}: ${text}`);
});
voice.on("speaker", (stream) => {
stream.pipe(speaker)
});
// Get microphone stream (implementation depends on your environment)
const microphoneStream = getMicrophoneStream();
// Send audio data to the voice provider
await voice.send(microphoneStream);
// You can also send audio data as Int16Array
const audioBuffer = getAudioBuffer(); // Assume this returns Int16Array
await voice.send(audioBuffer);
Parameters
audioData:
NodeJS.ReadableStream | Int16Array
Audio data to send to the voice provider. Can be a readable stream (like a microphone stream) or an Int16Array of audio samples.
Return Value
Returns a Promise<void>
that resolves when the audio data has been accepted by the voice provider.
Notes
- This method is only implemented by real-time voice providers that support speech-to-speech capabilities
- If called on a voice provider that doesn’t support this functionality, it will log a warning and resolve immediately
- You must call
connect()
before usingsend()
to establish the WebSocket connection - The audio format requirements depend on the specific voice provider
- For continuous conversation, you typically call
send()
to transmit user audio, thenanswer()
to trigger the AI response - The provider will typically emit ‘writing’ events with transcribed text as it processes the audio
- When the AI responds, the provider will emit ‘speaking’ events with the audio response