Azure
The AzureVoice class in Mastra provides text-to-speech and speech-to-text capabilities using Microsoft Azure Cognitive Services.
Usage ExampleDirect link to Usage Example
This requires Azure Speech Services credentials that can be provided through environment variables or directly in the configuration:
import { AzureVoice } from "@mastra/voice-azure";
// Initialize with configuration
const voice = new AzureVoice({
speechModel: {
apiKey: "your-azure-speech-api-key", // Or use AZURE_API_KEY env var
region: "eastus", // Or use AZURE_REGION env var
voiceName: "en-US-AriaNeural", // Optional: specific voice for TTS
},
listeningModel: {
apiKey: "your-azure-speech-api-key", // Or use AZURE_API_KEY env var
region: "eastus", // Or use AZURE_REGION env var
language: "en-US", // Optional: recognition language for STT
},
speaker: "en-US-JennyNeural", // Optional: default voice
});
// Convert text to speech
const audioStream = await voice.speak("Hello, how can I help you?", {
speaker: "en-US-GuyNeural", // Optional: override default voice
});
// Convert speech to text
const text = await voice.listen(audioStream);
ConfigurationDirect link to Configuration
Constructor OptionsDirect link to Constructor Options
speechModel?:
listeningModel?:
speaker?:
AzureSpeechConfigDirect link to AzureSpeechConfig
Configuration object for speech synthesis (speechModel) and recognition (listeningModel).
apiKey?:
region?:
voiceName?:
language?:
MethodsDirect link to Methods
speak()Direct link to speak()
Converts text to speech using Azure's neural text-to-speech service.
input:
options.speaker?:
Returns: Promise<NodeJS.ReadableStream> - Audio stream in WAV format
listen()Direct link to listen()
Transcribes audio using Azure's speech-to-text service.
audioStream:
Returns: Promise<string> - The recognized text from the audio
Note: Language and recognition settings are configured in the listeningModel configuration during initialization, not passed as options to this method.
getSpeakers()Direct link to getSpeakers()
Returns an array of available voice options (200+ voices), where each node contains:
voiceId:
language:
region:
Returns: Promise<Array<{ voiceId: string; language: string; region: string; }>>
Important NotesDirect link to Important Notes
Azure Speech Services vs Azure OpenAIDirect link to Azure Speech Services vs Azure OpenAI
⚠️ Critical: This package uses Azure Speech Services, which is different from Azure OpenAI Services.
- DO NOT use your
AZURE_OPENAI_API_KEYfor this package - DO use an Azure Speech Services subscription key (obtain from Azure Portal under "Speech Services")
- These are separate Azure resources with different API keys and endpoints
Environment VariablesDirect link to Environment Variables
API keys and regions can be provided via constructor options or environment variables:
AZURE_API_KEY- Your Azure Speech Services subscription keyAZURE_REGION- Your Azure region (e.g., 'eastus', 'westeurope')
Voice CapabilitiesDirect link to Voice Capabilities
- Azure offers 200+ neural voices across 50+ languages
- Each voice ID follows the format:
{language}-{region}-{name}Neural(e.g., 'en-US-JennyNeural') - Some voices include multilingual support or HD quality variants
- Audio output is in WAV format
- Audio input for recognition must be in WAV format
Available VoicesDirect link to Available Voices
Azure provides 200+ neural voices across many languages. Some popular English voices include:
-
US English:
en-US-AriaNeural(Female, default)en-US-JennyNeural(Female)en-US-GuyNeural(Male)en-US-DavisNeural(Male)en-US-AvaNeural(Female)en-US-AndrewNeural(Male)
-
British English:
en-GB-SoniaNeural(Female)en-GB-RyanNeural(Male)en-GB-LibbyNeural(Female)
-
Australian English:
en-AU-NatashaNeural(Female)en-AU-WilliamNeural(Male)
To get a complete list of all 200+ voices:
const voices = await voice.getSpeakers();
console.log(voices); // Array of { voiceId, language, region }
For more information, see the Azure Neural TTS documentation.