# Azure The AzureVoice class in Mastra provides text-to-speech and speech-to-text capabilities using Microsoft Azure Cognitive Services. ## Usage Example This requires Azure Speech Services credentials that can be provided through environment variables or directly in the configuration: ```typescript import { AzureVoice } from "@mastra/voice-azure"; // Initialize with configuration const voice = new AzureVoice({ speechModel: { apiKey: "your-azure-speech-api-key", // Or use AZURE_API_KEY env var region: "eastus", // Or use AZURE_REGION env var voiceName: "en-US-AriaNeural", // Optional: specific voice for TTS }, listeningModel: { apiKey: "your-azure-speech-api-key", // Or use AZURE_API_KEY env var region: "eastus", // Or use AZURE_REGION env var language: "en-US", // Optional: recognition language for STT }, speaker: "en-US-JennyNeural", // Optional: default voice }); // Convert text to speech const audioStream = await voice.speak("Hello, how can I help you?", { speaker: "en-US-GuyNeural", // Optional: override default voice }); // Convert speech to text const text = await voice.listen(audioStream); ``` ## Configuration ### Constructor Options ### speechModel?: AzureSpeechConfig Configuration for text-to-speech synthesis. ### listeningModel?: AzureSpeechConfig Configuration for speech-to-text recognition. ### speaker?: string Default voice ID for speech synthesis. ### AzureSpeechConfig Configuration object for speech synthesis (`speechModel`) and recognition (`listeningModel`). ### apiKey?: string Azure Speech Services API key (NOT Azure OpenAI key). Falls back to AZURE\_API\_KEY environment variable. ### region?: string Azure region (e.g., 'eastus', 'westeurope'). Falls back to AZURE\_REGION environment variable. ### voiceName?: string Voice ID for speech synthesis (e.g., 'en-US-AriaNeural', 'en-US-JennyNeural'). Only used in speechModel. See voice list below. ### language?: string Recognition language code (e.g., 'en-US', 'fr-FR'). Only used in listeningModel. ## Methods ### speak() Converts text to speech using Azure's neural text-to-speech service. ### input: string | NodeJS.ReadableStream Text or text stream to convert to speech. ### options.speaker?: string Voice ID to use for speech synthesis (e.g., 'en-US-JennyNeural'). Overrides the default voice. Returns: `Promise` - Audio stream in WAV format ### listen() Transcribes audio using Azure's speech-to-text service. ### audioStream: NodeJS.ReadableStream Audio stream to transcribe. Must be in WAV format. Returns: `Promise` - The recognized text from the audio **Note:** Language and recognition settings are configured in the `listeningModel` configuration during initialization, not passed as options to this method. ### getSpeakers() Returns an array of available voice options (200+ voices), where each node contains: ### voiceId: string Unique identifier for the voice (e.g., 'en-US-JennyNeural', 'fr-FR-DeniseNeural') ### language: string Language code extracted from voice ID (e.g., 'en', 'fr') ### region: string Region code extracted from voice ID (e.g., 'US', 'GB', 'FR') Returns: `Promise>` ## Important Notes ### Azure Speech Services vs Azure OpenAI **⚠️ Critical:** This package uses **Azure Speech Services**, which is different from **Azure OpenAI Services**. - **DO NOT** use your `AZURE_OPENAI_API_KEY` for this package - **DO** use an Azure Speech Services subscription key (obtain from Azure Portal under "Speech Services") - These are separate Azure resources with different API keys and endpoints ### Environment Variables API keys and regions can be provided via constructor options or environment variables: - `AZURE_API_KEY` - Your Azure Speech Services subscription key - `AZURE_REGION` - Your Azure region (e.g., 'eastus', 'westeurope') ### Voice Capabilities - Azure offers 200+ neural voices across 50+ languages - Each voice ID follows the format: `{language}-{region}-{name}Neural` (e.g., 'en-US-JennyNeural') - Some voices include multilingual support or HD quality variants - Audio output is in WAV format - Audio input for recognition must be in WAV format ## Available Voices Azure provides 200+ neural voices across many languages. Some popular English voices include: - **US English:** - `en-US-AriaNeural` (Female, default) - `en-US-JennyNeural` (Female) - `en-US-GuyNeural` (Male) - `en-US-DavisNeural` (Male) - `en-US-AvaNeural` (Female) - `en-US-AndrewNeural` (Male) - **British English:** - `en-GB-SoniaNeural` (Female) - `en-GB-RyanNeural` (Male) - `en-GB-LibbyNeural` (Female) - **Australian English:** - `en-AU-NatashaNeural` (Female) - `en-AU-WilliamNeural` (Male) To get a complete list of all 200+ voices: ```typescript const voices = await voice.getSpeakers(); console.log(voices); // Array of { voiceId, language, region } ``` For more information, see the [Azure Neural TTS documentation](https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/language-support?tabs=tts).