# Google The Google Voice implementation in Mastra provides both text-to-speech (TTS) and speech-to-text (STT) capabilities using Google Cloud services. It supports multiple voices, languages, advanced audio configuration options, and both standard API key authentication and Vertex AI mode for enterprise deployments. ## Usage Example ```typescript import { GoogleVoice } from "@mastra/voice-google"; // Initialize with default configuration (uses GOOGLE_API_KEY environment variable) const voice = new GoogleVoice(); // Text-to-Speech const audioStream = await voice.speak("Hello, world!", { languageCode: "en-US", audioConfig: { audioEncoding: "LINEAR16", }, }); // Speech-to-Text const transcript = await voice.listen(audioStream, { config: { encoding: "LINEAR16", languageCode: "en-US", }, }); // Get available voices for a specific language const voices = await voice.getSpeakers({ languageCode: "en-US" }); ``` ## Constructor Parameters **speechModel?:** (`GoogleModelConfig`): Configuration for text-to-speech functionality (Default: `{ apiKey: process.env.GOOGLE_API_KEY }`) **listeningModel?:** (`GoogleModelConfig`): Configuration for speech-to-text functionality (Default: `{ apiKey: process.env.GOOGLE_API_KEY }`) **speaker?:** (`string`): Default voice ID to use for text-to-speech (Default: `'en-US-Casual-K'`) **vertexAI?:** (`boolean`): Enable Vertex AI mode for enterprise deployments. Uses project-based authentication instead of API keys. Requires 'project' to be set. (Default: `false`) **project?:** (`string`): Google Cloud project ID (required when vertexAI is true). Falls back to GOOGLE\_CLOUD\_PROJECT environment variable. **location?:** (`string`): Google Cloud region for Vertex AI. Falls back to GOOGLE\_CLOUD\_LOCATION environment variable. (Default: `'us-central1'`) ### GoogleModelConfig **apiKey?:** (`string`): Google Cloud API key. Falls back to GOOGLE\_API\_KEY environment variable. Not used when vertexAI is true. **keyFilename?:** (`string`): Path to service account JSON key file. Falls back to GOOGLE\_APPLICATION\_CREDENTIALS environment variable. **credentials?:** (`object`): In-memory service account credentials object with client\_email and private\_key properties. ## Methods ### speak() Converts text to speech using Google Cloud Text-to-Speech service. **input:** (`string | NodeJS.ReadableStream`): Text to convert to speech. If a stream is provided, it will be converted to text first. **options?:** (`object`): Speech synthesis options **options.speaker?:** (`string`): Voice ID to use for this request **options.languageCode?:** (`string`): Language code for the voice (e.g., 'en-US'). Defaults to the language code from the speaker ID or 'en-US' **options.audioConfig?:** (`ISynthesizeSpeechRequest['audioConfig']`): Audio configuration options from Google Cloud Text-to-Speech API (Default: `{ audioEncoding: 'LINEAR16' }`) Returns: `Promise` ### listen() Converts speech to text using Google Cloud Speech-to-Text service. **audioStream:** (`NodeJS.ReadableStream`): Audio stream to transcribe **options?:** (`object`): Recognition options **options.stream?:** (`boolean`): Whether to use streaming recognition **options.config?:** (`IRecognitionConfig`): Recognition configuration from Google Cloud Speech-to-Text API (Default: `{ encoding: 'LINEAR16', languageCode: 'en-US' }`) Returns: `Promise` ### getSpeakers() Returns an array of available voice options, where each node contains: **voiceId:** (`string`): Unique identifier for the voice **languageCodes:** (`string[]`): List of language codes supported by this voice ### isUsingVertexAI() Checks if Vertex AI mode is enabled. Returns: `boolean` - `true` if using Vertex AI, `false` otherwise ### getProject() Gets the configured Google Cloud project ID. Returns: `string | undefined` - The project ID or `undefined` if not set ### getLocation() Gets the configured Google Cloud location/region. Returns: `string` - The location (default: `'us-central1'`) ## Authentication The Google Voice provider supports two authentication methods: ### Standard Mode (API Key) Uses a Google Cloud API key for authentication. Suitable for development and simple use cases. ```typescript // Using environment variable (GOOGLE_API_KEY) const voice = new GoogleVoice(); // Using explicit API key const voice = new GoogleVoice({ speechModel: { apiKey: "your-api-key" }, listeningModel: { apiKey: "your-api-key" }, speaker: "en-US-Casual-K", }); ``` ### Vertex AI Mode (Service Account) Uses Google Cloud project-based authentication with service accounts. Recommended for production and enterprise deployments. **Benefits:** - Better security (no API keys in code) - IAM-based access control - Project-level billing and quotas - Audit logging - Enterprise features **Configuration Options:** ```typescript // Using Application Default Credentials (ADC) // Set GOOGLE_APPLICATION_CREDENTIALS and GOOGLE_CLOUD_PROJECT env vars const voice = new GoogleVoice({ vertexAI: true, project: "your-gcp-project", location: "us-central1", // Optional, defaults to 'us-central1' }); // Using service account key file const voice = new GoogleVoice({ vertexAI: true, project: "your-gcp-project", speechModel: { keyFilename: "/path/to/service-account.json", }, listeningModel: { keyFilename: "/path/to/service-account.json", }, }); // Using in-memory credentials const voice = new GoogleVoice({ vertexAI: true, project: "your-gcp-project", speechModel: { credentials: { client_email: "service-account@project.iam.gserviceaccount.com", private_key: "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----", }, }, }); ``` **Required Permissions:** **IAM Roles:** For Text-to-Speech: - `roles/texttospeech.admin` - Text-to-Speech Admin (full access) - `roles/texttospeech.editor` - Text-to-Speech Editor (create and manage) - `roles/texttospeech.viewer` - Text-to-Speech Viewer (read-only) For Speech-to-Text: - `roles/speech.client` - Speech-to-Text Client **OAuth Scopes:** For synchronous Text-to-Speech synthesis: - `https://www.googleapis.com/auth/cloud-platform` - Full access to Google Cloud Platform services For long-audio Text-to-Speech operations: - `locations.longAudioSynthesize` - Create long-audio synthesis operations - `operations.get` - Get operation status - `operations.list` - List operations ## Important Notes 1. **Authentication**: Either a Google Cloud API key (standard mode) or service account credentials (Vertex AI mode) is required. 2. **Environment Variables**: - `GOOGLE_API_KEY` - API key for standard mode - `GOOGLE_CLOUD_PROJECT` - Project ID for Vertex AI mode - `GOOGLE_CLOUD_LOCATION` - Location for Vertex AI mode (defaults to 'us-central1') - `GOOGLE_APPLICATION_CREDENTIALS` - Path to service account key file 3. The default voice is set to `'en-US-Casual-K'`. 4. Both text-to-speech and speech-to-text services use LINEAR16 as the default audio encoding. 5. The `speak()` method supports advanced audio configuration through the Google Cloud Text-to-Speech API. 6. The `listen()` method supports various recognition configurations through the Google Cloud Speech-to-Text API. 7. Available voices can be filtered by language code using the `getSpeakers()` method. 8. Vertex AI mode provides enterprise features including IAM control, audit logs, and project-level billing.