Call Analysis with Mastra
This guide demonstrates how to build a complete voice conversation system with analytics using Mastra. The example includes real-time speech-to-speech conversation, recording management, and integration with Roark Analytics for call analysis.
Overview
The system creates a voice conversation with a Mastra agent, records the entire interaction, uploads the recording to Cloudinary for storage, and then sends the conversation data to Roark Analytics for detailed call analysis.
Setup
Prerequisites
- OpenAI API key for speech-to-text and text-to-speech capabilities
- Cloudinary account for audio file storage
- Roark Analytics API key for call analysis
Environment Configuration
Create a .env
file based on the sample provided:
OPENAI_API_KEY=
CLOUDINARY_CLOUD_NAME=
CLOUDINARY_API_KEY=
CLOUDINARY_API_SECRET=
ROARK_API_KEY=
Installation
Install the required dependencies:
npm install
Implementation
Creating the Mastra Agent
First, we define our agent with voice capabilities:
import { openai } from "@ai-sdk/openai";
import { Agent } from "@mastra/core/agent";
import { createTool } from "@mastra/core/tools";
import { OpenAIRealtimeVoice } from "@mastra/voice-openai-realtime";
import { z } from "zod";
// Have the agent do something
export const speechToSpeechServer = new Agent({
name: "mastra",
instructions: "You are a helpful assistant.",
voice: new OpenAIRealtimeVoice(),
model: openai("gpt-4o"),
tools: {
salutationTool: createTool({
id: "salutationTool",
description: "Read the result of the tool",
inputSchema: z.object({ name: z.string() }),
outputSchema: z.object({ message: z.string() }),
execute: async ({ context }) => {
return { message: `Hello ${context.name}!` };
},
}),
},
});
Initializing Mastra
Register the agent with Mastra:
import { Mastra } from "@mastra/core";
import { speechToSpeechServer } from "./agents";
export const mastra = new Mastra({
agents: {
speechToSpeechServer,
},
});
Cloudinary Integration for Audio Storage
Set up Cloudinary for storing the recorded audio files:
import { v2 as cloudinary } from "cloudinary";
cloudinary.config({
cloud_name: process.env.CLOUDINARY_CLOUD_NAME,
api_key: process.env.CLOUDINARY_API_KEY,
api_secret: process.env.CLOUDINARY_API_SECRET,
});
export async function uploadToCloudinary(path: string) {
const response = await cloudinary.uploader.upload(path, {
resource_type: "raw",
});
console.log(response);
return response.url;
}
Main Application Logic
The main application orchestrates the conversation flow, recording, and analytics integration:
import { Roark } from "@roarkanalytics/sdk";
import chalk from "chalk";
import { mastra } from "./mastra";
import { createConversation, formatToolInvocations } from "./utils";
import { uploadToCloudinary } from "./upload";
import fs from "fs";
const client = new Roark({
bearerToken: process.env.ROARK_API_KEY,
});
async function speechToSpeechServerExample() {
const { start, stop } = createConversation({
mastra,
recordingPath: "./speech-to-speech-server.mp3",
providerOptions: {},
initialMessage: "Howdy partner",
onConversationEnd: async (props) => {
// File upload
fs.writeFileSync(props.recordingPath, props.audioBuffer);
const url = await uploadToCloudinary(props.recordingPath);
// Send to Roark
console.log("Send to Roark", url);
const response = await client.callAnalysis.create({
recordingUrl: url,
startedAt: props.startedAt,
callDirection: "INBOUND",
interfaceType: "PHONE",
participants: [
{
role: "AGENT",
spokeFirst: props.agent.spokeFirst,
name: props.agent.name,
phoneNumber: props.agent.phoneNumber,
},
{
role: "CUSTOMER",
name: "Yujohn Nattrass",
phoneNumber: "987654321",
},
],
properties: props.metadata,
toolInvocations: formatToolInvocations(props.toolInvocations),
});
console.log("Call Recording Posted:", response.data);
},
onWriting: (ev) => {
if (ev.role === "assistant") {
process.stdout.write(chalk.blue(ev.text));
}
},
});
await start();
process.on("SIGINT", async (e) => {
await stop();
});
}
speechToSpeechServerExample().catch(console.error);
Conversation Utilities
The utils.ts
file contains helper functions for managing the conversation, including:
- Creating and managing the conversation session
- Handling audio recording
- Processing tool invocations
- Managing conversation lifecycle events
Running the Example
Start the conversation with:
npm run dev
The application will:
- Start a real-time voice conversation with the Mastra agent
- Record the entire conversation
- Upload the recording to Cloudinary when the conversation ends
- Send the conversation data to Roark Analytics for analysis
- Display the analysis results
Key Features
- Real-time Speech-to-Speech: Uses OpenAI’s voice models for natural conversation
- Conversation Recording: Captures the entire conversation for later analysis
- Tool Invocation Tracking: Records when and how AI tools are used during the conversation
- Analytics Integration: Sends conversation data to Roark Analytics for detailed analysis
- Cloud Storage: Uploads recordings to Cloudinary for secure storage and access
Customization
You can customize this example by:
- Modifying the agent’s instructions and capabilities
- Adding additional tools for the agent to use
- Changing the conversation flow or initial message
- Extending the analytics integration with custom metadata
To view the full example code, see the Github repository .