ExamplesVoiceSpeech to Text

Smart Voice Memo App

The following code snippets provide example implementations of Speech-to-Text (STT) functionality in a smart voice memo application using Next.js with direct integration of Mastra. For more details on integrating Mastra with Next.js, please refer to our Integrate with Next.js documentation.

Creating an Agent with STT Capabilities

The following example shows how to initialize a voice-enabled agent with OpenAI’s STT capabilities:

src/mastra/agents/index.ts
import { openai } from '@ai-sdk/openai';
import { Agent } from '@mastra/core/agent';
import { OpenAIVoice } from '@mastra/voice-openai';
 
const instructions = `
You are an AI note assistant tasked with providing concise, structured summaries of their content... // omitted for brevity
`;
 
export const noteTakerAgent = new Agent({
  name: 'Note Taker Agent',
  instructions: instructions,
  model: openai('gpt-4o'),
  voice: new OpenAIVoice(), // Add OpenAI voice provider with default configuration
});

Registering the Agent with Mastra

This snippet demonstrates how to register the STT-enabled agent with your Mastra instance:

src/mastra/index.ts
import { createLogger } from '@mastra/core/logger';
import { Mastra } from '@mastra/core/mastra';
 
import { noteTakerAgent } from './agents';
 
export const mastra = new Mastra({
  agents: { noteTakerAgent }, // Register the note taker agent
  logger: createLogger({
    name: 'Mastra',
    level: 'info',
  }),
});

Processing Audio for Transcription

The following code shows how to receive audio from a web request and use the agent’s STT capabilities to transcribe it:

app/api/audio/route.ts
import { mastra } from '@/src/mastra'; // Import the Mastra instance
import { Readable } from 'node:stream';
 
export async function POST(req: Request) {
  // Get the audio file from the request
  const formData = await req.formData();
  const audioFile = formData.get('audio') as File;
  const arrayBuffer = await audioFile.arrayBuffer();
  const buffer = Buffer.from(arrayBuffer);
  const readable = Readable.from(buffer);
 
  // Get the note taker agent from the Mastra instance
  const noteTakerAgent = mastra.getAgent('noteTakerAgent');
 
  // Transcribe the audio file
  const text = await noteTakerAgent.voice?.listen(readable);
 
  return new Response(JSON.stringify({ text }), {
    headers: { 'Content-Type': 'application/json' },
  });
}

You can view the complete implementation of the Smart Voice Memo App on our GitHub repository.






View Example on GitHub