Google

The Google Voice implementation in Mastra provides both text-to-speech (TTS) and speech-to-text (STT) capabilities using Google Cloud services. It supports multiple voices, languages, and advanced audio configuration options.

Usage Example


import { GoogleVoice } from "@mastra/voice-google";
 
// Initialize with default configuration (uses GOOGLE_API_KEY environment variable)
const voice = new GoogleVoice();
 
// Initialize with custom configuration
const voice = new GoogleVoice({
  speechModel: {
    apiKey: 'your-speech-api-key',
  },
  listeningModel: {
    apiKey: 'your-listening-api-key',
  },
  speaker: 'en-US-Casual-K',
});
 
// Text-to-Speech
const audioStream = await voice.speak("Hello, world!", {
  languageCode: 'en-US',
  audioConfig: {
    audioEncoding: 'LINEAR16',
  },
});
 
// Speech-to-Text
const transcript = await voice.listen(audioStream, {
  config: {
    encoding: 'LINEAR16',
    languageCode: 'en-US',
  },
});
 
// Get available voices for a specific language
const voices = await voice.getSpeakers({ languageCode: 'en-US' });

Constructor Parameters

speechModel?:

GoogleModelConfig

= { apiKey: process.env.GOOGLE_API_KEY }

Configuration for text-to-speech functionality

listeningModel?:

GoogleModelConfig

= { apiKey: process.env.GOOGLE_API_KEY }

Configuration for speech-to-text functionality

speaker?:

string

= 'en-US-Casual-K'

Default voice ID to use for text-to-speech

GoogleModelConfig

apiKey?:

string

Google Cloud API key. Falls back to GOOGLE_API_KEY environment variable

Methods

speak()

Converts text to speech using Google Cloud Text-to-Speech service.

input:

string | NodeJS.ReadableStream

Text to convert to speech. If a stream is provided, it will be converted to text first.

options?:

object

Speech synthesis options

options.speaker?:

string

Voice ID to use for this request

options.languageCode?:

string

Language code for the voice (e.g., 'en-US'). Defaults to the language code from the speaker ID or 'en-US'

options.audioConfig?:

ISynthesizeSpeechRequest['audioConfig']

= { audioEncoding: 'LINEAR16' }

Audio configuration options from Google Cloud Text-to-Speech API

Returns: Promise<NodeJS.ReadableStream>

listen()

Converts speech to text using Google Cloud Speech-to-Text service.

audioStream:

NodeJS.ReadableStream

Audio stream to transcribe

options?:

object

Recognition options

options.stream?:

boolean

Whether to use streaming recognition

options.config?:

IRecognitionConfig

= { encoding: 'LINEAR16', languageCode: 'en-US' }

Recognition configuration from Google Cloud Speech-to-Text API

Returns: Promise<string>

getSpeakers()

Returns an array of available voice options, where each node contains:

voiceId:

string

Unique identifier for the voice

languageCodes:

string[]

List of language codes supported by this voice

Important Notes

A Google Cloud API key is required. Set it via the GOOGLE_API_KEY environment variable or pass it in the constructor.
The default voice is set to ‘en-US-Casual-K’.
Both text-to-speech and speech-to-text services use LINEAR16 as the default audio encoding.
The speak() method supports advanced audio configuration through the Google Cloud Text-to-Speech API.
The listen() method supports various recognition configurations through the Google Cloud Speech-to-Text API.
Available voices can be filtered by language code using the getSpeakers() method.