Azure
The AzureVoice class in Mastra provides text-to-speech and speech-to-text capabilities using Microsoft Azure Cognitive Services.
Usage Example
This requires Azure Speech Services credentials that can be provided through environment variables or directly in the configuration:
import { AzureVoice } from "@mastra/voice-azure";
// Initialize with configuration
const voice = new AzureVoice({
speechModel: {
apiKey: "your-azure-speech-api-key", // Or use AZURE_API_KEY env var
region: "eastus", // Or use AZURE_REGION env var
voiceName: "en-US-AriaNeural", // Optional: specific voice for TTS
},
listeningModel: {
apiKey: "your-azure-speech-api-key", // Or use AZURE_API_KEY env var
region: "eastus", // Or use AZURE_REGION env var
language: "en-US", // Optional: recognition language for STT
},
speaker: "en-US-JennyNeural", // Optional: default voice
});
// Convert text to speech
const audioStream = await voice.speak("Hello, how can I help you?", {
speaker: "en-US-GuyNeural", // Optional: override default voice
});
// Convert speech to text
const text = await voice.listen(audioStream);
Configuration
Constructor Options
speechModel?:
listeningModel?:
speaker?:
AzureSpeechConfig
Configuration object for speech synthesis (speechModel) and recognition (listeningModel).
apiKey?:
region?:
voiceName?:
language?:
Methods
speak()
Converts text to speech using Azure's neural text-to-speech service.
input:
options.speaker?:
Returns: Promise<NodeJS.ReadableStream> - Audio stream in WAV format
listen()
Transcribes audio using Azure's speech-to-text service.
audioStream:
Returns: Promise<string> - The recognized text from the audio
Note: Language and recognition settings are configured in the listeningModel configuration during initialization, not passed as options to this method.
getSpeakers()
Returns an array of available voice options (200+ voices), where each node contains:
voiceId:
language:
region:
Returns: Promise<Array<{ voiceId: string; language: string; region: string; }>>
Important Notes
Azure Speech Services vs Azure OpenAI
⚠️ Critical: This package uses Azure Speech Services, which is different from Azure OpenAI Services.
- DO NOT use your
AZURE_OPENAI_API_KEYfor this package - DO use an Azure Speech Services subscription key (obtain from Azure Portal under "Speech Services")
- These are separate Azure resources with different API keys and endpoints
Environment Variables
API keys and regions can be provided via constructor options or environment variables:
AZURE_API_KEY- Your Azure Speech Services subscription keyAZURE_REGION- Your Azure region (e.g., 'eastus', 'westeurope')
Voice Capabilities
- Azure offers 200+ neural voices across 50+ languages
- Each voice ID follows the format:
{language}-{region}-{name}Neural(e.g., 'en-US-JennyNeural') - Some voices include multilingual support or HD quality variants
- Audio output is in WAV format
- Audio input for recognition must be in WAV format
Available Voices
Azure provides 200+ neural voices across many languages. Some popular English voices include:
-
US English:
en-US-AriaNeural(Female, default)en-US-JennyNeural(Female)en-US-GuyNeural(Male)en-US-DavisNeural(Male)en-US-AvaNeural(Female)en-US-AndrewNeural(Male)
-
British English:
en-GB-SoniaNeural(Female)en-GB-RyanNeural(Male)en-GB-LibbyNeural(Female)
-
Australian English:
en-AU-NatashaNeural(Female)en-AU-WilliamNeural(Male)
To get a complete list of all 200+ voices:
const voices = await voice.getSpeakers();
console.log(voices); // Array of { voiceId, language, region }
For more information, see the Azure Neural TTS documentation.