Google

The Google Voice implementation in Mastra provides both text-to-speech (TTS) and speech-to-text (STT) capabilities using Google Cloud services. It supports multiple voices, languages, and advanced audio configuration options.

Usage Example

import { GoogleVoice } from "@mastra/voice-google";
 
// Initialize with default configuration (uses GOOGLE_API_KEY environment variable)
const voice = new GoogleVoice();
 
// Initialize with custom configuration
const voice = new GoogleVoice({
  speechModel: {
    apiKey: 'your-speech-api-key',
  },
  listeningModel: {
    apiKey: 'your-listening-api-key',
  },
  speaker: 'en-US-Casual-K',
});
 
// Text-to-Speech
const audioStream = await voice.speak("Hello, world!", {
  languageCode: 'en-US',
  audioConfig: {
    audioEncoding: 'LINEAR16',
  },
});
 
// Speech-to-Text
const transcript = await voice.listen(audioStream, {
  config: {
    encoding: 'LINEAR16',
    languageCode: 'en-US',
  },
});
 
// Get available voices for a specific language
const voices = await voice.getSpeakers({ languageCode: 'en-US' });

Constructor Parameters

speechModel?:

GoogleModelConfig
= { apiKey: process.env.GOOGLE_API_KEY }
Configuration for text-to-speech functionality

listeningModel?:

GoogleModelConfig
= { apiKey: process.env.GOOGLE_API_KEY }
Configuration for speech-to-text functionality

speaker?:

string
= 'en-US-Casual-K'
Default voice ID to use for text-to-speech

GoogleModelConfig

apiKey?:

string
Google Cloud API key. Falls back to GOOGLE_API_KEY environment variable

Methods

speak()

Converts text to speech using Google Cloud Text-to-Speech service.

input:

string | NodeJS.ReadableStream
Text to convert to speech. If a stream is provided, it will be converted to text first.

options?:

object
Speech synthesis options

options.speaker?:

string
Voice ID to use for this request

options.languageCode?:

string
Language code for the voice (e.g., 'en-US'). Defaults to the language code from the speaker ID or 'en-US'

options.audioConfig?:

ISynthesizeSpeechRequest['audioConfig']
= { audioEncoding: 'LINEAR16' }
Audio configuration options from Google Cloud Text-to-Speech API

Returns: Promise<NodeJS.ReadableStream>

listen()

Converts speech to text using Google Cloud Speech-to-Text service.

audioStream:

NodeJS.ReadableStream
Audio stream to transcribe

options?:

object
Recognition options

options.stream?:

boolean
Whether to use streaming recognition

options.config?:

IRecognitionConfig
= { encoding: 'LINEAR16', languageCode: 'en-US' }
Recognition configuration from Google Cloud Speech-to-Text API

Returns: Promise<string>

getSpeakers()

Returns an array of available voice options, where each node contains:

voiceId:

string
Unique identifier for the voice

languageCodes:

string[]
List of language codes supported by this voice

Important Notes

  1. A Google Cloud API key is required. Set it via the GOOGLE_API_KEY environment variable or pass it in the constructor.
  2. The default voice is set to ‘en-US-Casual-K’.
  3. Both text-to-speech and speech-to-text services use LINEAR16 as the default audio encoding.
  4. The speak() method supports advanced audio configuration through the Google Cloud Text-to-Speech API.
  5. The listen() method supports various recognition configurations through the Google Cloud Speech-to-Text API.
  6. Available voices can be filtered by language code using the getSpeakers() method.