Building a voice research agent with Mastra, Render, and AssemblyAI

How ravendr — Render and AssemblyAI's reference voice research agent — combines the AssemblyAI Voice Agent API, Render Workflows, and per-stage Mastra agents to answer in under 60 seconds without dead air.

Ashwin MudaliarAshwin Mudaliar·

May 8, 2026

·

8 min read

Earlier this week, our friends at Render and AssemblyAI shipped ravendr: a production voice research agent that classifies, plans, searches, synthesizes, and verifies an answer in under 60 seconds without the audio channel ever going silent.

Ryan Seams (AssemblyAI) and Ojus Save (Render) wrote up the orchestration side of the build. Here's the agent-layer view. Both posts point to the same code.

Voice agents are multi-agent systems

You could give one big LLM all the tools and let it loop. Some teams do exactly that, and it can work for text chat. The harness gets out of the way, and the model figures it out.

But it falls apart for voice, for three reasons:

  • Latency stacks. Tool selection, tool execution, and synthesis run sequentially inside one context window, and the user is waiting on the audio channel the whole time.
  • There's no clean place to verify. If the answer is wrong, you find out when the user does.
  • Failure modes are coupled. A bad classification poisons the rest of the loop, and you can't retry just one stage.

So ravendr breaks the work into stages: classify, plan, search, synthesize, verify. Each stage is its own agent. The voice agent itself — the one talking to the user — lives upstream, in AssemblyAI's Voice Agent API.

The architecture

Three layers, each doing a thing it's good at.

Voice (AssemblyAI). The user's audio streams over a WebSocket to AssemblyAI's Voice Agent API, which handles speech-to-text, the conversational LLM, text-to-speech, and tool-call events. When the model decides to invoke a tool ("research this question for me"), AssemblyAI surfaces that as an event ravendr routes into its research pipeline.

Orchestration (Render Workflows). The research pipeline runs as a Render Workflow task (research), and inside it each stage is its own subtask: classify_ask, plan_queries, search_branch, synthesize, verify. Each subtask gets its own compute plan, timeout, retry policy, and replay log. Render's post covers this in depth.

Agents (Mastra). Inside the classify, plan, synthesize, and verify subtasks, ravendr runs Mastra agents. Each has a single job and opinionated instructions. The agents don't know about Render Workflows, and the Workflow tasks don't know about Mastra internals.

One agent per stage

The Mastra agent factory pattern from src/mastra/agents.ts:

 1import { Agent } from "@mastra/core/agent";
 2
 3function normalize(model: string): string {
 4  return model.includes("/") ? model : `anthropic/${model}`;
 5}
 6
 7export function classifierAgent(anthropicModel: string): Agent {
 8  return new Agent({
 9    id: "ravendr-classifier",
10    name: "ravendr-classifier",
11    instructions: CLASSIFY_INSTRUCTIONS,
12    model: normalize(anthropicModel),
13  });
14}

The model is a string. Mastra's router parses provider/model-name and dispatches. The agent is constructed per call rather than as a singleton, which keeps model selection in config. Instructions stay tight: the classifier just buckets the user's ask into one of a few output shapes.

The planner, synthesizer, and verifier follow the same factory pattern, with prompts that adapt to whatever shape the classifier returned.

JSON parsing happens in the subtask, where a small helper pulls JSON from the agent's text response and falls back to a sensible default if the model's output is malformed. The same pattern shows up in verify.

If you want to extend this with memory, tools, or scorers, see the Mastra docs.

Durability from Render and intelligence from Mastra

A Mastra agent is just a function. To make it durable — restartable, retryable, observable — ravendr runs each one inside a Render Workflow subtask. Here's classify_ask:

 1import { task } from "@renderinc/sdk/workflows";
 2import { classifierAgent } from "../../../mastra/agents.js";
 3
 4export const classify_ask = task(
 5  {
 6    name: "classify_ask",
 7    plan: "starter",
 8    timeoutSeconds: 30,
 9    retry: { maxRetries: 2, waitDurationMs: 500, backoffScaling: 1.5 },
10  },
11  async function classify_ask(sessionId: string, topic: string) {
12    const config = loadWorkflowConfig();
13    const agent = classifierAgent(config.ANTHROPIC_MODEL);
14    const result = await agent.generate(`User ask: "${topic}"\n\nClassify. JSON only.`);
15    const shape = parseShape(result.text ?? "");
16    await events.publish({ sessionId, kind: "ask.classified", shape });
17    return { shape };
18  },
19);

Durability comes from Render and intelligence from Mastra, and the two layers don't leak into each other. Each handoff is data rather than control flow: the classifier returns a shape, the planner takes that shape and returns queries, and the synthesizer takes search results and returns a briefing.

For the full task definitions, plan selection, and dashboard view, see Ryan and Ojus's post.

The voice channel: AssemblyAI Voice Agent API

The user's voice never touches Mastra or Render directly. AssemblyAI's Voice Agent API handles the WebSocket, the STT, the conversational model, and the TTS, while ravendr's job is to be a tool the voice agent can call.

When AssemblyAI's voice agent decides to invoke the research tool, the voice_session Render task receives the tool-call event, kicks off the research chain, and streams progress back. AssemblyAI's voice agent stays alive in the meantime, saying "let me look that up for you" while the heavy work runs in parallel.

The browser-to-task connection uses a reverse WebSocket pattern: the browser opens a WebSocket to a broker, the Render task opens one to the same broker, and the broker pairs them. Browser and task end up connected without either knowing the other's address.

Every stage gets a budget

The deadline isn't a single 60-second timeout but a budget that cascades down the pipeline.

From src/render/tasks/research.ts:

 1const OVERALL_BUDGET_MS = 55_000; // 55s leaves ~5s for broker + browser render
 2const SYNTH_RESERVE_MS  = 12_000; // minimum runway to write the briefing
 3const VERIFY_RESERVE_MS = 6_000;  // minimum runway for verify
 4const RETRY_RESERVE_MS  = 35_000; // don't attempt a retry unless this much is left

Synthesis needs 12 seconds, verify needs 6, and a retry only fires if 35 remain. Each stage checks the clock and either runs or steps aside.

Search is where the design earns its keep. The planner returns N queries (3 for a simple ask, up to 40 for an exhaustive one), and ravendr races them in parallel against whatever budget is left:

 1const searchBudget = Math.max(5_000, remaining() - SYNTH_RESERVE_MS);
 2const branches = await racePartial(
 3  plan.queries.map((q) => search_branch(sessionId, q.angle, q.query, q.tier)),
 4  searchBudget,
 5);

racePartial returns whichever branches resolve before the budget runs out. If 30 are in flight and only 18 finish, the synthesizer works from those 18 while the rest keep running as orphan subtasks.

If nothing comes back in time, the pipeline doesn't error. It seeds the synthesizer with an "overview" branch carrying "no research results came back in time" so the synthesizer still writes a "couldn't find anything" briefing. Silence is the failure mode users notice first, and a partial answer is recoverable.

Verify, retry once, default to ship

After synthesize, the verifier runs on the same factory pattern. It reads the briefing and decides whether it actually answers the question.

The verdict gets parsed like the classifier's, and if parsing fails, the verifier defaults to pass:

 1} catch (err) {
 2  logger.warn({ err, raw: text.slice(0, 200) },
 3    "verify: unparseable verdict, defaulting to pass");
 4  return { passes: true, reason: "verifier output unreadable — defaulting to pass", feedback: "" };
 5}

A misbehaving guard fails open. The pipeline already has a synthesized response, so the verifier serves as a guard rather than a gate.

When the verifier returns a clean fail, the feedback goes to the planner rather than the synthesizer. The next loop iteration calls plan_queries with the verifier's feedback baked in: new queries get drafted, new searches run, and a new synthesis happens. One retry only; then ravendr ships whatever it has.

Streaming progress to the UI

The audio channel can't fill 55 seconds of dead air alone. While the voice agent says "let me look that up for you," the UI shows phase-level progress: ask.classified, plan.ready, youcom.call.started, verify.started, briefing.ready. Each Render subtask publishes events to a Postgres-backed event bus, and the browser subscribes via SSE through the reverse-tunnel broker.

SSE is old, but voice UX needs visual compensation when the audio channel is busy.

Concurrency, deployment, env

ravendr caps concurrent sessions at 100, gives each a 15-minute TTL, and runs a cleanup daemon that retires expired sessions. Deployment is a render.yaml defining two services (web and workflow), a Postgres for session state and the event bus, and four env vars: ANTHROPIC_API_KEY, ASSEMBLYAI_API_KEY, YOUCOM_API_KEY, plus a Render API key. Full setup is in the ravendr README.

What this gives you

The voice channel and the agent work belong on different schedules. AssemblyAI handles the audio while Render Workflows runs the heavy stuff in parallel, so the user never hears silence.

Each research stage is a small agent rather than a slice of one giant prompt, and Mastra's per-stage agent factories enable that.

Every stage gets a budget. The pipeline is opinionated about graceful degradation: if the search fan-out runs over, it synthesizes from what came back; if the verifier can't be parsed, it still ships the briefing; if there's no time for a retry, it takes the first answer. It always produces something.

Render and AssemblyAI shipped ravendr as a reference build. Clone it, deploy it, talk to it. Add memory, tools, or scorers when your build calls for it.

 1git clone https://github.com/render-examples/ravendr

For the orchestration deep-dive, read Ryan and Ojus's post. And check out our docs for the agent layer.

Share:
Ashwin Mudaliar

Ashwin Mudaliar works on GTM at Mastra, finding ways to help partners support developers ship agents into production. A Stanford grad, he also enjoys yoga and surfing.

All articles by Ashwin Mudaliar