The Google Voice implementation in Mastra provides both text-to-speech (TTS) and speech-to-text (STT) capabilities using Google Cloud services. It supports multiple voices, languages, advanced audio configuration options, and both standard API key authentication and Vertex AI mode for enterprise deployments.
Usage ExampleDirect link to Usage Example
import { GoogleVoice } from "@mastra/voice-google";
// Initialize with default configuration (uses GOOGLE_API_KEY environment variable)
const voice = new GoogleVoice();
// Text-to-Speech
const audioStream = await voice.speak("Hello, world!", {
languageCode: "en-US",
audioConfig: {
audioEncoding: "LINEAR16",
},
});
// Speech-to-Text
const transcript = await voice.listen(audioStream, {
config: {
encoding: "LINEAR16",
languageCode: "en-US",
},
});
// Get available voices for a specific language
const voices = await voice.getSpeakers({ languageCode: "en-US" });
Constructor ParametersDirect link to Constructor Parameters
speechModel?:
listeningModel?:
speaker?:
vertexAI?:
project?:
location?:
GoogleModelConfigDirect link to GoogleModelConfig
apiKey?:
keyFilename?:
credentials?:
MethodsDirect link to Methods
speak()Direct link to speak()
Converts text to speech using Google Cloud Text-to-Speech service.
input:
options?:
options.speaker?:
options.languageCode?:
options.audioConfig?:
Returns: Promise<NodeJS.ReadableStream>
listen()Direct link to listen()
Converts speech to text using Google Cloud Speech-to-Text service.
audioStream:
options?:
options.stream?:
options.config?:
Returns: Promise<string>
getSpeakers()Direct link to getSpeakers()
Returns an array of available voice options, where each node contains:
voiceId:
languageCodes:
isUsingVertexAI()Direct link to isUsingVertexAI()
Checks if Vertex AI mode is enabled.
Returns: boolean - true if using Vertex AI, false otherwise
getProject()Direct link to getProject()
Gets the configured Google Cloud project ID.
Returns: string | undefined - The project ID or undefined if not set
getLocation()Direct link to getLocation()
Gets the configured Google Cloud location/region.
Returns: string - The location (default: 'us-central1')
AuthenticationDirect link to Authentication
The Google Voice provider supports two authentication methods:
Standard Mode (API Key)Direct link to Standard Mode (API Key)
Uses a Google Cloud API key for authentication. Suitable for development and simple use cases.
// Using environment variable (GOOGLE_API_KEY)
const voice = new GoogleVoice();
// Using explicit API key
const voice = new GoogleVoice({
speechModel: { apiKey: "your-api-key" },
listeningModel: { apiKey: "your-api-key" },
speaker: "en-US-Casual-K",
});
Vertex AI Mode (Service Account)Direct link to Vertex AI Mode (Service Account)
Uses Google Cloud project-based authentication with service accounts. Recommended for production and enterprise deployments.
Benefits:
- Better security (no API keys in code)
- IAM-based access control
- Project-level billing and quotas
- Audit logging
- Enterprise features
Configuration Options:
// Using Application Default Credentials (ADC)
// Set GOOGLE_APPLICATION_CREDENTIALS and GOOGLE_CLOUD_PROJECT env vars
const voice = new GoogleVoice({
vertexAI: true,
project: "your-gcp-project",
location: "us-central1", // Optional, defaults to 'us-central1'
});
// Using service account key file
const voice = new GoogleVoice({
vertexAI: true,
project: "your-gcp-project",
speechModel: {
keyFilename: "/path/to/service-account.json",
},
listeningModel: {
keyFilename: "/path/to/service-account.json",
},
});
// Using in-memory credentials
const voice = new GoogleVoice({
vertexAI: true,
project: "your-gcp-project",
speechModel: {
credentials: {
client_email: "service-account@project.iam.gserviceaccount.com",
private_key: "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----",
},
},
});
Required Permissions:
IAM Roles:
For Text-to-Speech:
roles/texttospeech.admin- Text-to-Speech Admin (full access)roles/texttospeech.editor- Text-to-Speech Editor (create and manage)roles/texttospeech.viewer- Text-to-Speech Viewer (read-only)
For Speech-to-Text:
roles/speech.client- Speech-to-Text Client
OAuth Scopes:
For synchronous Text-to-Speech synthesis:
https://www.googleapis.com/auth/cloud-platform- Full access to Google Cloud Platform services
For long-audio Text-to-Speech operations:
locations.longAudioSynthesize- Create long-audio synthesis operationsoperations.get- Get operation statusoperations.list- List operations
Important NotesDirect link to Important Notes
- Authentication: Either a Google Cloud API key (standard mode) or service account credentials (Vertex AI mode) is required.
- Environment Variables:
GOOGLE_API_KEY- API key for standard modeGOOGLE_CLOUD_PROJECT- Project ID for Vertex AI modeGOOGLE_CLOUD_LOCATION- Location for Vertex AI mode (defaults to 'us-central1')GOOGLE_APPLICATION_CREDENTIALS- Path to service account key file
- The default voice is set to
'en-US-Casual-K'. - Both text-to-speech and speech-to-text services use LINEAR16 as the default audio encoding.
- The
speak()method supports advanced audio configuration through the Google Cloud Text-to-Speech API. - The
listen()method supports various recognition configurations through the Google Cloud Speech-to-Text API. - Available voices can be filtered by language code using the
getSpeakers()method. - Vertex AI mode provides enterprise features including IAM control, audit logs, and project-level billing.