Call Analysis with Mastra
This guide demonstrates how to build a complete voice conversation system with analytics using Mastra. The example includes real-time speech-to-speech conversation, recording management, and integration with Roark Analytics for call analysis.
OverviewDirect link to Overview
The system creates a voice conversation with a Mastra agent, records the entire interaction, uploads the recording to Cloudinary for storage, and then sends the conversation data to Roark Analytics for detailed call analysis.
SetupDirect link to Setup
PrerequisitesDirect link to Prerequisites
- OpenAI API key for speech-to-text and text-to-speech capabilities
- Cloudinary account for audio file storage
- Roark Analytics API key for call analysis
Environment ConfigurationDirect link to Environment Configuration
Create a .env file based on the sample provided:
OPENAI_API_KEY=
CLOUDINARY_CLOUD_NAME=
CLOUDINARY_API_KEY=
CLOUDINARY_API_SECRET=
ROARK_API_KEY=
InstallationDirect link to Installation
Install the required dependencies:
npm install
ImplementationDirect link to Implementation
Creating the Mastra AgentDirect link to Creating the Mastra Agent
First, we define our agent with voice capabilities:
import { openai } from "@ai-sdk/openai";
import { Agent } from "@mastra/core/agent";
import { createTool } from "@mastra/core/tools";
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import { z } from "zod";
// Have the agent do something
export const speechToSpeechServer = new Agent({
name: "mastra",
instructions: "You are a helpful assistant.",
voice: new OpenAIRealtimeVoice(),
model: openai("gpt-4o"),
tools: {
salutationTool: createTool({
id: "salutationTool",
description: "Read the result of the tool",
inputSchema: z.object({ name: z.string() }),
outputSchema: z.object({ message: z.string() }),
execute: async ({ context }) => {
return { message: `Hello ${context.name}!` };
},
}),
},
});
Initializing MastraDirect link to Initializing Mastra
Register the agent with Mastra:
import { Mastra } from "@mastra/core";
import { speechToSpeechServer } from "./agents";
export const mastra = new Mastra({
agents: {
speechToSpeechServer,
},
});
Cloudinary Integration for Audio StorageDirect link to Cloudinary Integration for Audio Storage
Set up Cloudinary for storing the recorded audio files:
import { v2 as cloudinary } from "cloudinary";
cloudinary.config({
cloud_name: process.env.CLOUDINARY_CLOUD_NAME,
api_key: process.env.CLOUDINARY_API_KEY,
api_secret: process.env.CLOUDINARY_API_SECRET,
});
export async function uploadToCloudinary(path: string) {
const response = await cloudinary.uploader.upload(path, {
resource_type: "raw",
});
console.log(response);
return response.url;
}
Main Application LogicDirect link to Main Application Logic
The main application orchestrates the conversation flow, recording, and analytics integration:
import { Roark } from "@roarkanalytics/sdk";
import chalk from "chalk";
import { mastra } from "./mastra";
import { createConversation, formatToolInvocations } from "./utils";
import { uploadToCloudinary } from "./upload";
import fs from "fs";
const client = new Roark({
bearerToken: process.env.ROARK_API_KEY,
});
async function speechToSpeechServerExample() {
const { start, stop } = createConversation({
mastra,
recordingPath: "./speech-to-speech-server.mp3",
providerOptions: {},
initialMessage: "Howdy partner",
onConversationEnd: async (props) => {
// File upload
fs.writeFileSync(props.recordingPath, props.audioBuffer);
const url = await uploadToCloudinary(props.recordingPath);
// Send to Roark
console.log("Send to Roark", url);
const response = await client.callAnalysis.create({
recordingUrl: url,
startedAt: props.startedAt,
callDirection: "INBOUND",
interfaceType: "PHONE",
participants: [
{
role: "AGENT",
spokeFirst: props.agent.spokeFirst,
name: props.agent.name,
phoneNumber: props.agent.phoneNumber,
},
{
role: "CUSTOMER",
name: "Yujohn Nattrass",
phoneNumber: "987654321",
},
],
properties: props.metadata,
toolInvocations: formatToolInvocations(props.toolInvocations),
});
console.log("Call Recording Posted:", response.data);
},
onWriting: (ev) => {
if (ev.role === "assistant") {
process.stdout.write(chalk.blue(ev.text));
}
},
});
await start();
process.on("SIGINT", async (e) => {
await stop();
});
}
speechToSpeechServerExample().catch(console.error);
Conversation UtilitiesDirect link to Conversation Utilities
The utils.ts file contains helper functions for managing the conversation, including:
- Creating and managing the conversation session
- Handling audio recording
- Processing tool invocations
- Managing conversation lifecycle events
Running the ExampleDirect link to Running the Example
Start the conversation with:
npm run dev
The application will:
- Start a real-time voice conversation with the Mastra agent
- Record the entire conversation
- Upload the recording to Cloudinary when the conversation ends
- Send the conversation data to Roark Analytics for analysis
- Display the analysis results
Key FeaturesDirect link to Key Features
- Real-time Speech-to-Speech: Uses OpenAI's voice models for natural conversation
- Conversation Recording: Captures the entire conversation for later analysis
- Tool Invocation Tracking: Records when and how AI tools are used during the conversation
- Analytics Integration: Sends conversation data to Roark Analytics for detailed analysis
- Cloud Storage: Uploads recordings to Cloudinary for secure storage and access
CustomizationDirect link to Customization
You can customize this example by:
- Modifying the agent's instructions and capabilities
- Adding additional tools for the agent to use
- Changing the conversation flow or initial message
- Extending the analytics integration with custom metadata
To view the full example code, see the Github repository.
View source on GitHub