Call Analysis with Mastra
This guide demonstrates how to build a complete voice conversation system with analytics using Mastra. The example includes real-time speech-to-speech conversation, recording management, and integration with Roark Analytics for call analysis.
Overview
The system creates a voice conversation with a Mastra agent, records the entire interaction, uploads the recording to Cloudinary for storage, and then sends the conversation data to Roark Analytics for detailed call analysis.
Setup
Prerequisites
- OpenAI API key for speech-to-text and text-to-speech capabilities
- Cloudinary account for audio file storage
- Roark Analytics API key for call analysis
Environment Configuration
Create a .env
file based on the sample provided:
OPENAI_API_KEY=
CLOUDINARY_CLOUD_NAME=
CLOUDINARY_API_KEY=
CLOUDINARY_API_SECRET=
ROARK_API_KEY=
Installation
Install the required dependencies:
npm install
Implementation
Creating the Mastra Agent
First, we define our agent with voice capabilities:
import { openai } from '@ai-sdk/openai';
import { Agent } from '@mastra/core/agent';
import { createTool } from '@mastra/core/tools';
import { OpenAIRealtimeVoice } from '@mastra/voice-openai-realtime';
import { z } from 'zod';
// Have the agent do something
export const speechToSpeechServer = new Agent({
name: 'mastra',
instructions: 'You are a helpful assistant.',
voice: new OpenAIRealtimeVoice(),
model: openai('gpt-4o'),
tools: {
salutationTool: createTool({
id: 'salutationTool',
description: 'Read the result of the tool',
inputSchema: z.object({ name: z.string() }),
outputSchema: z.object({ message: z.string() }),
execute: async ({ context }) => {
return { message: `Hello ${context.name}!` }
}
})
}
});
Initializing Mastra
Register the agent with Mastra:
import { Mastra } from '@mastra/core';
import { speechToSpeechServer } from './agents';
export const mastra = new Mastra({
agents: {
speechToSpeechServer,
}
})
Cloudinary Integration for Audio Storage
Set up Cloudinary for storing the recorded audio files:
import { v2 as cloudinary } from 'cloudinary';
cloudinary.config({
cloud_name: process.env.CLOUDINARY_CLOUD_NAME,
api_key: process.env.CLOUDINARY_API_KEY,
api_secret: process.env.CLOUDINARY_API_SECRET
});
export async function uploadToCloudinary(path: string) {
const response = await cloudinary.uploader.upload(path, { resource_type: 'raw' })
console.log(response)
return response.url
}
Main Application Logic
The main application orchestrates the conversation flow, recording, and analytics integration:
import { Roark } from '@roarkanalytics/sdk';
import chalk from 'chalk';
import { mastra } from './mastra';
import { createConversation, formatToolInvocations } from './utils';
import { uploadToCloudinary } from './upload';
import fs from 'fs';
const client = new Roark({
bearerToken: process.env.ROARK_API_KEY
});
async function speechToSpeechServerExample() {
const { start, stop } = createConversation({
mastra,
recordingPath: './speech-to-speech-server.mp3',
providerOptions: {},
initialMessage: 'Howdy partner',
onConversationEnd: async (props) => {
// File upload
fs.writeFileSync(props.recordingPath, props.audioBuffer)
const url = await uploadToCloudinary(props.recordingPath)
// Send to Roark
console.log('Send to Roark', url)
const response = await client.callAnalysis.create({
recordingUrl: url,
startedAt: props.startedAt,
callDirection: 'INBOUND',
interfaceType: 'PHONE',
participants: [
{ role: 'AGENT', spokeFirst: props.agent.spokeFirst, name: props.agent.name, phoneNumber: props.agent.phoneNumber },
{ role: 'CUSTOMER', name: 'Yujohn Nattrass', phoneNumber: '987654321' },
],
properties: props.metadata,
toolInvocations: formatToolInvocations(props.toolInvocations),
});
console.log('Call Recording Posted:', response.data);
},
onWriting: (ev) => {
if (ev.role === 'assistant') {
process.stdout.write(chalk.blue(ev.text));
}
},
});
await start();
process.on('SIGINT', async (e) => {
await stop();
})
}
speechToSpeechServerExample().catch(console.error)
Conversation Utilities
The utils.ts
file contains helper functions for managing the conversation, including:
- Creating and managing the conversation session
- Handling audio recording
- Processing tool invocations
- Managing conversation lifecycle events
Running the Example
Start the conversation with:
npm run dev
The application will:
- Start a real-time voice conversation with the Mastra agent
- Record the entire conversation
- Upload the recording to Cloudinary when the conversation ends
- Send the conversation data to Roark Analytics for analysis
- Display the analysis results
Key Features
- Real-time Speech-to-Speech: Uses OpenAI’s voice models for natural conversation
- Conversation Recording: Captures the entire conversation for later analysis
- Tool Invocation Tracking: Records when and how AI tools are used during the conversation
- Analytics Integration: Sends conversation data to Roark Analytics for detailed analysis
- Cloud Storage: Uploads recordings to Cloudinary for secure storage and access
Customization
You can customize this example by:
- Modifying the agent’s instructions and capabilities
- Adding additional tools for the agent to use
- Changing the conversation flow or initial message
- Extending the analytics integration with custom metadata
To view the full example code, see the Github repository .