Smart Voice Memo App
The following code snippets provide example implementations of Speech-to-Text (STT) functionality in a smart voice memo application using Next.js with direct integration of Mastra. For more details on integrating Mastra with Next.js, please refer to our Integrate with Next.js documentation.
Creating an Agent with STT Capabilities
The following example shows how to initialize a voice-enabled agent with OpenAI’s STT capabilities:
src/mastra/agents/index.ts
import { openai } from '@ai-sdk/openai';
import { Agent } from '@mastra/core/agent';
import { OpenAIVoice } from '@mastra/voice-openai';
const instructions = `
You are an AI note assistant tasked with providing concise, structured summaries of their content... // omitted for brevity
`;
export const noteTakerAgent = new Agent({
name: 'Note Taker Agent',
instructions: instructions,
model: openai('gpt-4o'),
voice: new OpenAIVoice(), // Add OpenAI voice provider with default configuration
});
Registering the Agent with Mastra
This snippet demonstrates how to register the STT-enabled agent with your Mastra instance:
src/mastra/index.ts
import { createLogger } from '@mastra/core/logger';
import { Mastra } from '@mastra/core/mastra';
import { noteTakerAgent } from './agents';
export const mastra = new Mastra({
agents: { noteTakerAgent }, // Register the note taker agent
logger: createLogger({
name: 'Mastra',
level: 'info',
}),
});
Processing Audio for Transcription
The following code shows how to receive audio from a web request and use the agent’s STT capabilities to transcribe it:
app/api/audio/route.ts
import { mastra } from '@/src/mastra'; // Import the Mastra instance
import { Readable } from 'node:stream';
export async function POST(req: Request) {
// Get the audio file from the request
const formData = await req.formData();
const audioFile = formData.get('audio') as File;
const arrayBuffer = await audioFile.arrayBuffer();
const buffer = Buffer.from(arrayBuffer);
const readable = Readable.from(buffer);
// Get the note taker agent from the Mastra instance
const noteTakerAgent = mastra.getAgent('noteTakerAgent');
// Transcribe the audio file
const text = await noteTakerAgent.voice?.listen(readable);
return new Response(JSON.stringify({ text }), {
headers: { 'Content-Type': 'application/json' },
});
}
You can view the complete implementation of the Smart Voice Memo App on our GitHub repository.
View Example on GitHub