The Google Voice implementation in Mastra provides both text-to-speech (TTS) and speech-to-text (STT) capabilities using Google Cloud services. It supports multiple voices, languages, and advanced audio configuration options.
Usage Example
import { GoogleVoice } from "@mastra/voice-google";
// Initialize with default configuration (uses GOOGLE_API_KEY environment variable)
const voice = new GoogleVoice();
// Initialize with custom configuration
const voice = new GoogleVoice({
speechModel: {
apiKey: 'your-speech-api-key',
},
listeningModel: {
apiKey: 'your-listening-api-key',
},
speaker: 'en-US-Casual-K',
});
// Text-to-Speech
const audioStream = await voice.speak("Hello, world!", {
languageCode: 'en-US',
audioConfig: {
audioEncoding: 'LINEAR16',
},
});
// Speech-to-Text
const transcript = await voice.listen(audioStream, {
config: {
encoding: 'LINEAR16',
languageCode: 'en-US',
},
});
// Get available voices for a specific language
const voices = await voice.getSpeakers({ languageCode: 'en-US' });
Constructor Parameters
speechModel?:
GoogleModelConfig
= { apiKey: process.env.GOOGLE_API_KEY }
Configuration for text-to-speech functionality
listeningModel?:
GoogleModelConfig
= { apiKey: process.env.GOOGLE_API_KEY }
Configuration for speech-to-text functionality
speaker?:
string
= 'en-US-Casual-K'
Default voice ID to use for text-to-speech
GoogleModelConfig
apiKey?:
string
Google Cloud API key. Falls back to GOOGLE_API_KEY environment variable
Methods
speak()
Converts text to speech using Google Cloud Text-to-Speech service.
input:
string | NodeJS.ReadableStream
Text to convert to speech. If a stream is provided, it will be converted to text first.
options?:
object
Speech synthesis options
options.speaker?:
string
Voice ID to use for this request
options.languageCode?:
string
Language code for the voice (e.g., 'en-US'). Defaults to the language code from the speaker ID or 'en-US'
options.audioConfig?:
ISynthesizeSpeechRequest['audioConfig']
= { audioEncoding: 'LINEAR16' }
Audio configuration options from Google Cloud Text-to-Speech API
Returns: Promise<NodeJS.ReadableStream>
listen()
Converts speech to text using Google Cloud Speech-to-Text service.
audioStream:
NodeJS.ReadableStream
Audio stream to transcribe
options?:
object
Recognition options
options.stream?:
boolean
Whether to use streaming recognition
options.config?:
IRecognitionConfig
= { encoding: 'LINEAR16', languageCode: 'en-US' }
Recognition configuration from Google Cloud Speech-to-Text API
Returns: Promise<string>
getSpeakers()
Returns an array of available voice options, where each node contains:
voiceId:
string
Unique identifier for the voice
languageCodes:
string[]
List of language codes supported by this voice
Important Notes
- A Google Cloud API key is required. Set it via the
GOOGLE_API_KEY
environment variable or pass it in the constructor. - The default voice is set to ‘en-US-Casual-K’.
- Both text-to-speech and speech-to-text services use LINEAR16 as the default audio encoding.
- The
speak()
method supports advanced audio configuration through the Google Cloud Text-to-Speech API. - The
listen()
method supports various recognition configurations through the Google Cloud Speech-to-Text API. - Available voices can be filtered by language code using the
getSpeakers()
method.