Google Gemini Live Voice
The GeminiLiveVoice class provides real-time voice interaction capabilities using Google’s Gemini Live API. It supports bidirectional audio streaming, tool calling, session management, and both standard Google API and Vertex AI authentication methods.
Usage Example
import { GeminiLiveVoice } from "@mastra/voice-google-gemini-live";
import { playAudio, getMicrophoneStream } from "@mastra/node-audio";
// Initialize with Gemini API (using API key)
const voice = new GeminiLiveVoice({
apiKey: process.env.GOOGLE_API_KEY, // Required for Gemini API
model: "gemini-2.0-flash-exp",
speaker: "Puck", // Default voice
debug: true,
});
// Or initialize with Vertex AI (using OAuth)
const voiceWithVertexAI = new GeminiLiveVoice({
vertexAI: true,
project: "your-gcp-project",
location: "us-central1",
serviceAccountKeyFile: "/path/to/service-account.json",
model: "gemini-2.0-flash-exp",
speaker: "Puck",
});
// Or use the VoiceConfig pattern (recommended for consistency with other providers)
const voiceWithConfig = new GeminiLiveVoice({
speechModel: {
name: "gemini-2.0-flash-exp",
apiKey: process.env.GOOGLE_API_KEY,
},
speaker: "Puck",
realtimeConfig: {
model: "gemini-2.0-flash-exp",
apiKey: process.env.GOOGLE_API_KEY,
options: {
debug: true,
sessionConfig: {
interrupts: { enabled: true },
},
},
},
});
// Establish connection (required before using other methods)
await voice.connect();
// Set up event listeners
voice.on("speaker", (audioStream) => {
// Handle audio stream (NodeJS.ReadableStream)
playAudio(audioStream);
});
voice.on("writing", ({ text, role }) => {
// Handle transcribed text
console.log(`${role}: ${text}`);
});
voice.on("turnComplete", ({ timestamp }) => {
// Handle turn completion
console.log("Turn completed at:", timestamp);
});
// Convert text to speech
await voice.speak("Hello, how can I help you today?", {
speaker: "Charon", // Override default voice
responseModalities: ["AUDIO", "TEXT"],
});
// Process audio input
const microphoneStream = getMicrophoneStream();
await voice.send(microphoneStream);
// Update session configuration
await voice.updateSessionConfig({
speaker: "Kore",
instructions: "Be more concise in your responses",
});
// When done, disconnect
await voice.disconnect();
// Or use the synchronous wrapper
voice.close();
Configuration
Constructor Options
apiKey?:
model?:
speaker?:
vertexAI?:
project?:
location?:
serviceAccountKeyFile?:
serviceAccountEmail?:
instructions?:
sessionConfig?:
debug?:
Session Configuration
interrupts?:
interrupts.enabled?:
interrupts.allowUserInterruption?:
contextCompression?:
Methods
connect()
Establishes a connection to the Gemini Live API. Must be called before using speak, listen, or send methods.
runtimeContext?:
returns:
speak()
Converts text to speech and sends it to the model. Can accept either a string or a readable stream as input.
input:
options?:
options.speaker?:
options.languageCode?:
options.responseModalities?:
Returns: Promise<void>
(responses are emitted via speaker
and writing
events)
listen()
Processes audio input for speech recognition. Takes a readable stream of audio data and returns the transcribed text.
audioStream:
options?:
Returns: Promise<string>
- The transcribed text
send()
Streams audio data in real-time to the Gemini service for continuous audio streaming scenarios like live microphone input.
audioData:
Returns: Promise<void>
updateSessionConfig()
Updates the session configuration dynamically. This can be used to modify voice settings, speaker selection, and other runtime configurations.
config:
Returns: Promise<void>
addTools()
Adds a set of tools to the voice instance. Tools allow the model to perform additional actions during conversations. When GeminiLiveVoice is added to an Agent, any tools configured for the Agent will automatically be available to the voice interface.
tools:
Returns: void
addInstructions()
Adds or updates system instructions for the model.
instructions?:
Returns: void
answer()
Triggers a response from the model. This method is primarily used internally when integrated with an Agent.
options?:
Returns: Promise<void>
getSpeakers()
Returns a list of available voice speakers for the Gemini Live API.
Returns: Promise<Array<{ voiceId: string; description?: string }>>
disconnect()
Disconnects from the Gemini Live session and cleans up resources. This is the async method that properly handles cleanup.
Returns: Promise<void>
close()
Synchronous wrapper for disconnect(). Calls disconnect() internally without awaiting.
Returns: void
on()
Registers an event listener for voice events.
event:
callback:
Returns: void
off()
Removes a previously registered event listener.
event:
callback:
Returns: void
Events
The GeminiLiveVoice class emits the following events:
speaker:
speaking:
writing:
session:
turnComplete:
toolCall:
usage:
error:
interrupt:
Available Models
The following Gemini Live models are available:
gemini-2.0-flash-exp
(default)gemini-2.0-flash-exp-image-generation
gemini-2.0-flash-live-001
gemini-live-2.5-flash-preview-native-audio
gemini-2.5-flash-exp-native-audio-thinking-dialog
gemini-live-2.5-flash-preview
gemini-2.6.flash-preview-tts
Available Voices
The following voice options are available:
Puck
(default): Conversational, friendlyCharon
: Deep, authoritativeKore
: Neutral, professionalFenrir
: Warm, approachable
Authentication Methods
Gemini API (Development)
The simplest method using an API key from Google AI Studio :
const voice = new GeminiLiveVoice({
apiKey: "your-api-key", // Required for Gemini API
model: "gemini-2.0-flash-exp",
});
Vertex AI (Production)
For production use with OAuth authentication and Google Cloud Platform:
// Using service account key file
const voice = new GeminiLiveVoice({
vertexAI: true,
project: "your-gcp-project",
location: "us-central1",
serviceAccountKeyFile: "/path/to/service-account.json",
});
// Using Application Default Credentials
const voice = new GeminiLiveVoice({
vertexAI: true,
project: "your-gcp-project",
location: "us-central1",
});
// Using service account impersonation
const voice = new GeminiLiveVoice({
vertexAI: true,
project: "your-gcp-project",
location: "us-central1",
serviceAccountEmail: "service-account@project.iam.gserviceaccount.com",
});
Advanced Features
Session Management
The Gemini Live API supports session resumption for handling network interruptions:
voice.on("sessionHandle", ({ handle, expiresAt }) => {
// Store session handle for resumption
saveSessionHandle(handle, expiresAt);
});
// Resume a previous session
const voice = new GeminiLiveVoice({
sessionConfig: {
enableResumption: true,
maxDuration: "2h",
},
});
Tool Calling
Enable the model to call functions during conversations:
import { z } from 'zod';
voice.addTools({
weather: {
description: "Get weather information",
parameters: z.object({
location: z.string(),
}),
execute: async ({ location }) => {
const weather = await getWeather(location);
return weather;
},
},
});
voice.on("toolCall", ({ name, args, id }) => {
console.log(`Tool called: ${name} with args:`, args);
});
Notes
- The Gemini Live API uses WebSockets for real-time communication
- Audio is processed as 16kHz PCM16 for input and 24kHz PCM16 for output
- The voice instance must be connected with
connect()
before using other methods - Always call
close()
when done to properly clean up resources - Vertex AI authentication requires appropriate IAM permissions (
aiplatform.user
role) - Session resumption allows recovery from network interruptions
- The API supports real-time interactions with text and audio