# Speech-to-Text (STT) Speech-to-Text (STT) in Mastra provides a standardized interface for converting audio input into text across multiple service providers. STT helps create voice-enabled applications that can respond to human speech, enabling hands-free interaction, accessibility for users with disabilities, and more natural human-computer interfaces. ## Configuration To use STT in Mastra, you need to provide a `listeningModel` when initializing the voice provider. This includes parameters such as: - **`name`**: The specific STT model to use. - **`apiKey`**: Your API key for authentication. - **Provider-specific options**: Additional options that may be required or supported by the specific voice provider. **Note**: All of these parameters are optional. You can use the default settings provided by the voice provider, which will depend on the specific provider you are using. ```typescript const voice = new OpenAIVoice({ listeningModel: { name: "whisper-1", apiKey: process.env.OPENAI_API_KEY, }, }); // If using default settings the configuration can be simplified to: const voice = new OpenAIVoice(); ``` ## Available Providers Mastra supports several Speech-to-Text providers, each with their own capabilities and strengths: - [**OpenAI**](https://mastra.ai/reference/voice/openai/llms.txt) - High-accuracy transcription with Whisper models - [**Azure**](https://mastra.ai/reference/voice/azure/llms.txt) - Microsoft's speech recognition with enterprise-grade reliability - [**ElevenLabs**](https://mastra.ai/reference/voice/elevenlabs/llms.txt) - Advanced speech recognition with support for multiple languages - [**Google**](https://mastra.ai/reference/voice/google/llms.txt) - Google's speech recognition with extensive language support - [**Cloudflare**](https://mastra.ai/reference/voice/cloudflare/llms.txt) - Edge-optimized speech recognition for low-latency applications - [**Deepgram**](https://mastra.ai/reference/voice/deepgram/llms.txt) - AI-powered speech recognition with high accuracy for various accents - [**Sarvam**](https://mastra.ai/reference/voice/sarvam/llms.txt) - Specialized in Indic languages and accents Each provider is implemented as a separate package that you can install as needed: ```bash pnpm add @mastra/voice-openai@latest # Example for OpenAI ``` ## Using the Listen Method The primary method for STT is the `listen()` method, which converts spoken audio into text. Here's how to use it: ```typescript import { Agent } from "@mastra/core/agent"; import { OpenAIVoice } from "@mastra/voice-openai"; import { getMicrophoneStream } from "@mastra/node-audio"; const voice = new OpenAIVoice(); const agent = new Agent({ id: "voice-agent", name: "Voice Agent", instructions: "You are a voice assistant that provides recommendations based on user input.", model: "openai/gpt-5.1", voice, }); const audioStream = getMicrophoneStream(); // Assume this function gets audio input const transcript = await agent.voice.listen(audioStream, { filetype: "m4a", // Optional: specify the audio file type }); console.log(`User said: ${transcript}`); const { text } = await agent.generate( `Based on what the user said, provide them a recommendation: ${transcript}`, ); console.log(`Recommendation: ${text}`); ``` Check out the [Adding Voice to Agents](https://mastra.ai/docs/agents/adding-voice/llms.txt) documentation to learn how to use STT in an agent.