Skip to Content

Google

The Google Voice implementation in Mastra provides both text-to-speech (TTS) and speech-to-text (STT) capabilities using Google Cloud services. It supports multiple voices, languages, and advanced audio configuration options.

Usage Example

import { GoogleVoice } from "@mastra/voice-google"; // Initialize with default configuration (uses GOOGLE_API_KEY environment variable) const voice = new GoogleVoice(); // Initialize with custom configuration const voice = new GoogleVoice({ speechModel: { apiKey: 'your-speech-api-key', }, listeningModel: { apiKey: 'your-listening-api-key', }, speaker: 'en-US-Casual-K', }); // Text-to-Speech const audioStream = await voice.speak("Hello, world!", { languageCode: 'en-US', audioConfig: { audioEncoding: 'LINEAR16', }, }); // Speech-to-Text const transcript = await voice.listen(audioStream, { config: { encoding: 'LINEAR16', languageCode: 'en-US', }, }); // Get available voices for a specific language const voices = await voice.getSpeakers({ languageCode: 'en-US' });

Constructor Parameters

speechModel?:

GoogleModelConfig
= { apiKey: process.env.GOOGLE_API_KEY }
Configuration for text-to-speech functionality

listeningModel?:

GoogleModelConfig
= { apiKey: process.env.GOOGLE_API_KEY }
Configuration for speech-to-text functionality

speaker?:

string
= 'en-US-Casual-K'
Default voice ID to use for text-to-speech

GoogleModelConfig

apiKey?:

string
Google Cloud API key. Falls back to GOOGLE_API_KEY environment variable

Methods

speak()

Converts text to speech using Google Cloud Text-to-Speech service.

input:

string | NodeJS.ReadableStream
Text to convert to speech. If a stream is provided, it will be converted to text first.

options?:

object
Speech synthesis options

options.speaker?:

string
Voice ID to use for this request

options.languageCode?:

string
Language code for the voice (e.g., 'en-US'). Defaults to the language code from the speaker ID or 'en-US'

options.audioConfig?:

ISynthesizeSpeechRequest['audioConfig']
= { audioEncoding: 'LINEAR16' }
Audio configuration options from Google Cloud Text-to-Speech API

Returns: Promise<NodeJS.ReadableStream>

listen()

Converts speech to text using Google Cloud Speech-to-Text service.

audioStream:

NodeJS.ReadableStream
Audio stream to transcribe

options?:

object
Recognition options

options.stream?:

boolean
Whether to use streaming recognition

options.config?:

IRecognitionConfig
= { encoding: 'LINEAR16', languageCode: 'en-US' }
Recognition configuration from Google Cloud Speech-to-Text API

Returns: Promise<string>

getSpeakers()

Returns an array of available voice options, where each node contains:

voiceId:

string
Unique identifier for the voice

languageCodes:

string[]
List of language codes supported by this voice

Important Notes

  1. A Google Cloud API key is required. Set it via the GOOGLE_API_KEY environment variable or pass it in the constructor.
  2. The default voice is set to ‘en-US-Casual-K’.
  3. Both text-to-speech and speech-to-text services use LINEAR16 as the default audio encoding.
  4. The speak() method supports advanced audio configuration through the Google Cloud Text-to-Speech API.
  5. The listen() method supports various recognition configurations through the Google Cloud Speech-to-Text API.
  6. Available voices can be filtered by language code using the getSpeakers() method.