Skip to main content
Mastra v1 is coming in January 2026. Get ahead by starting new projects with the beta or upgrade your existing project today.

Google

The Google Voice implementation in Mastra provides both text-to-speech (TTS) and speech-to-text (STT) capabilities using Google Cloud services. It supports multiple voices, languages, and advanced audio configuration options.

Usage ExampleDirect link to Usage Example

import { GoogleVoice } from "@mastra/voice-google";

// Initialize with default configuration (uses GOOGLE_API_KEY environment variable)
const voice = new GoogleVoice();

// Initialize with custom configuration
const voice = new GoogleVoice({
speechModel: {
apiKey: "your-speech-api-key",
},
listeningModel: {
apiKey: "your-listening-api-key",
},
speaker: "en-US-Casual-K",
});

// Text-to-Speech
const audioStream = await voice.speak("Hello, world!", {
languageCode: "en-US",
audioConfig: {
audioEncoding: "LINEAR16",
},
});

// Speech-to-Text
const transcript = await voice.listen(audioStream, {
config: {
encoding: "LINEAR16",
languageCode: "en-US",
},
});

// Get available voices for a specific language
const voices = await voice.getSpeakers({ languageCode: "en-US" });

Constructor ParametersDirect link to Constructor Parameters

speechModel?:

GoogleModelConfig
= { apiKey: process.env.GOOGLE_API_KEY }
Configuration for text-to-speech functionality

listeningModel?:

GoogleModelConfig
= { apiKey: process.env.GOOGLE_API_KEY }
Configuration for speech-to-text functionality

speaker?:

string
= 'en-US-Casual-K'
Default voice ID to use for text-to-speech

GoogleModelConfigDirect link to GoogleModelConfig

apiKey?:

string
Google Cloud API key. Falls back to GOOGLE_API_KEY environment variable

MethodsDirect link to Methods

speak()Direct link to speak()

Converts text to speech using Google Cloud Text-to-Speech service.

input:

string | NodeJS.ReadableStream
Text to convert to speech. If a stream is provided, it will be converted to text first.

options?:

object
Speech synthesis options

options.speaker?:

string
Voice ID to use for this request

options.languageCode?:

string
Language code for the voice (e.g., 'en-US'). Defaults to the language code from the speaker ID or 'en-US'

options.audioConfig?:

ISynthesizeSpeechRequest['audioConfig']
= { audioEncoding: 'LINEAR16' }
Audio configuration options from Google Cloud Text-to-Speech API

Returns: Promise<NodeJS.ReadableStream>

listen()Direct link to listen()

Converts speech to text using Google Cloud Speech-to-Text service.

audioStream:

NodeJS.ReadableStream
Audio stream to transcribe

options?:

object
Recognition options

options.stream?:

boolean
Whether to use streaming recognition

options.config?:

IRecognitionConfig
= { encoding: 'LINEAR16', languageCode: 'en-US' }
Recognition configuration from Google Cloud Speech-to-Text API

Returns: Promise<string>

getSpeakers()Direct link to getSpeakers()

Returns an array of available voice options, where each node contains:

voiceId:

string
Unique identifier for the voice

languageCodes:

string[]
List of language codes supported by this voice

Important NotesDirect link to Important Notes

  1. A Google Cloud API key is required. Set it via the GOOGLE_API_KEY environment variable or pass it in the constructor.
  2. The default voice is set to 'en-US-Casual-K'.
  3. Both text-to-speech and speech-to-text services use LINEAR16 as the default audio encoding.
  4. The speak() method supports advanced audio configuration through the Google Cloud Text-to-Speech API.
  5. The listen() method supports various recognition configurations through the Google Cloud Speech-to-Text API.
  6. Available voices can be filtered by language code using the getSpeakers() method.