Today we're incredibly excited to announce Mastra Harness.
We've been working on this for almost six months. The team built MastraCode, a TUI based coding agent we drive every day. Then, we extracted out the best parts into the Harness class for you to build your own harness.
Think about the Harness as the layer around the agent loop. It gives you a conversation you can watch, interrupt, and steer; memory suitable for long runs; storage that persists sessions; control over tool calls; task delegation to subagents; and modes you switch between with specialized agents.
Let's dive in.
How Harness works
The most fundamental concept of Mastra's Harness is the Session that persists state across turns. You can switch between plan and build mode, with per-mode tools, models, and instructions, plus a plan-approval workflow.
Manage threads? Yes you can. Harness has a thread lifecycle, you can create, switch, rename, delete, clone.
Spawn subagents? Yup, including forking to re-use the cache prefix.
Multi-turn? You got it. Harness has a loop with follow-up queuing + steer().
Ask users questions? We have a built-in ask_user tool.
Context management happens via Mastra's industry-leading observational memory system.
My personal favorite part is how easy we make it for you to display state to the user.
There's a lot going on under the hood, and for that, Harness has a pub/sub event system. It emits 35 signals, under a display_state_changed event type with a state payload and event types like agent_start, tool_input_delta, tool_suspended, subagent_text_delta, follow_up_queued, usage_update, thread_changed.
This all reduces into a HarnessDisplayState object which can can be consumed by web, mobile, or TUI, including fields like currentMessage, activeTools, pendingApproval, pendingSuspensions, activeSubagents, tasks etc.
When should I use Harness?
You're asking the right question. Early in 2025 we spent a lot of time talking about agents vs LLMs as a spectrum like self-driving cars. Single-turn vs multi-turn, tool calling, loop vs one-shot -- all of these were indications you needed an agent, not just an LLM.
Now harness vs agent is a similar spectrum. Here are signs you should be using Harness:
- If it’s supposed to be long-running or autonomous
- If it’s a colleague you’re having long conversations with
- If you’re sending it off for more than a few minutes to do a task
- If it’s writing and executing a lot of code, especially in a loop
Deep dive
Let's walk through some Harness concerns, then I'll send you over to the docs
Initialize
Here's a quick blurb for getting started. Define a couple different agents, then create a harness with modes:
import { Harness } from "@mastra/core/harness";
import { Agent } from "@mastra/core/agent";
import { Memory } from "@mastra/memory";
import { LibSQLStore } from "@mastra/libsql";
const planAgent = new Agent({...});
const executeAgent = new Agent({...});
export const harnessDemo = new Harness({
id: "harness-demo",
storage: new LibSQLStore({...}),
memory: new Memory({...}),
subagents: [...],
modes: [
{ id: "plan", name: "Plan", metadata: { default: true }, agent: planAgent },
{ id: "execute", name: "Execute", agent: executeAgent }
]
});
await harnessDemo.init();
// switch from planning to doing
await harnessDemo.switchMode({ modeId: "execute" });Subscribe to channels
Most agent APIs are one-shot: you send a prompt, the model streams a reply. But the harness keeps the conversation open.
You subscribe to a thread once, then send messages in and read everything the agent emits back: every message, tool call, and state change — as a single event stream that stays open for as long as the thread lives.
harnessDemo.subscribe((event) => {
if (event.type === "message_update") {
console.log(event.message);
}
});This lets you interrupt a running task, queue a follow-up message, steer it mid-run — and if the process ends, re-attach to the thread and pick up where you left off.
await harnessDemo.selectOrCreateThread();Several clients can subscribe to a thread at once — a terminal, a web UI, a Slack bot. More than one person can work with the same agent at the same time.
await harnessDemo.sendMessage({ content: "Hello!" });Subscribe to state
When an agent runs, it emits a stream of low-level deltas — text fragments, reasoning deltas, tool-call starts, tool results. With a standard agent, you have to sort through them to track the current state of the run.
With the harness, those deltas are folded into a single display state — collapsing the noise and providing a clearer picture. You can build your UI/TUI around the state — using it to provide user feedback, or ask when input is required.
harnessDemo.subscribe((event) => {
if (event.type !== "display_state_changed") return;
const state = harnessDemo.getDisplayState();
if (state.isRunning) {
// the agent is working — update your UI
}
// the agent's live to-do list
// each task: { id, content, status: "pending" | "in_progress" | "completed", activeForm }
console.log(state.tasks);
});State includes running tools, token usage, pending approvals, subagent activity, memory progress, and more — see the Harness class reference for a full list of state options.
Persist state
A harness holds more state than a single agent run — the conversation history, the active mode, models, token usage, and memory settings. For a long-running agent to survive restarts, crashes or tabs closing, state can't be stored in the running process.
The harness preserves state using Mastra's built-in storage — saved using a threadId. Configure a default store and the whole session is persisted.
import { Harness } from "@mastra/core/harness";
import { MastraCompositeStore } from "@mastra/core/storage";
import { LibSQLStore } from "@mastra/libsql";
import { WorkflowsPG, ScoresPG } from '@mastra/pg'
import { ObservabilityStorageClickhouseVNext } from "@mastra/clickhouse";
const harnessDemo = new Harness({
// ...
storage: new MastraCompositeStore({
id: "harness-storage",
default: new LibSQLStore({...}),
domains: {
observability: new ObservabilityStorageClickhouseVNext({...}),
workflows: new WorkflowsPG({...}),
scores: new ScoresPG({...}),
}
})
});Approve tool calls
A standard Mastra agent can request approval before a tool runs — for every tool, or just the ones you choose. But it asks again on every call.
With a harness, an approval carries across the session. Grant permission for a single tool or a whole category, and the agent won't ask again.
harnessDemo.subscribe((event) => {
if (event.type === "tool_approval_required") {
// surface event.toolName to the user, then send their options
harnessDemo.respondToToolApproval({
decision: userChoice // "approve" | "decline" | "always_allow_category"
});
}
});You can also set rules up front — for safe categories like read.
// allow every read without prompting
harnessDemo.setPermissionForCategory({ category: "read", policy: "allow" });Or skip the asking entirely. Flip on YOLO and every tool call just runs.
// auto-approve everything in the run
await harnessDemo.setState({ yolo: true });You can still keep one guardrail. Set a tool's policy to deny, and the harness won't ever run it.
// block a tool — it can never run, even under YOLO
harnessDemo.setPermissionForTool({ toolName: "dangerous_tool", policy: "deny" });Asking the user questions
Approvals gate a tool call, but sometimes the agent needs to pause and ask you a question. The built-in ask_user tool pauses the run, emits a tool_suspended event, and waits for an answer.
harnessDemo.subscribe((event) => {
if (event.type === "tool_suspended" && event.toolName === "ask_user") {
const { question } = event.suspendPayload; // show the question, get an answer
harnessDemo.respondToToolSuspension({
toolCallId: event.toolCallId,
resumeData: answer
});
}
});Spawn isolated and forked subagents
Add subagents, each with their own instructions, model, tools, and a description of what it does. The harness builds an agent from each and exposes them as a built-in tool. Your running agent — planAgent or executeAgent, the parent — calls that tool mid-task, picking a subagent by its description.
By default a subagent is isolated: it inherits none of the parent's context, so it reasons without influence. Here's what that looks like:
import { Harness } from "@mastra/core/harness";
import { createTool } from "@mastra/core/tools";
const readFileTool = createTool({...});
const grepTool = createTool({...});
const harnessDemo = new Harness({
// ...
subagents: [
{
id: "reviewer-agent",
name: "Reviewer Agent",
description: "Reviews the diff for bugs",
instructions: "Review the diff. Be skeptical, assume nothing.",
defaultModelId: "anthropic/claude-opus-4-6",
tools: { readFileTool, grepTool }
}
]
});But sometimes context matters. And when it does, you can set a subagent to forked: true. The harness clones the parent's conversation and runs the subagent as the parent. The model picks up where it left off on a warm prompt cache.
import { Harness } from "@mastra/core/harness";
import { Agent } from "@mastra/core/agent";
import { Memory } from "@mastra/memory";
const planAgent = new Agent({...});
const executeAgent = new Agent({...});
const harnessDemo = new Harness({
// ...
memory: new Memory({...}), // forked subagents require memory
modes: [
{ id: "plan", name: "Plan", metadata: { default: true }, agent: planAgent, transitionsTo: "execute" },
{ id: "execute", name: "Execute", agent: executeAgent }
],
subagents: [
{
id: "explorer-agent",
name: "Explorer Agent",
description: "Explores the codebase",
forked: true // reuses the active mode's agent — the parent
}
]
});Modes
A standard agent runs one setup — one set of instructions, one model, one set of tools. To make it plan first and execute second, you'd run two separate agents and wire up the handoff yourself.
A mode bundles that setup — an agent and its model — under a name you switch to. Plan and Execute live in the same session. You switch with switchMode, and the conversation carries straight over.
A mode can also say where it hands off next. Give the plan a transitionsTo, and once approved the session switches into execute on its own.
import { Harness } from "@mastra/core/harness";
import { Agent } from "@mastra/core/agent";
const planAgent = new Agent({...});
const executeAgent = new Agent({...});
const harnessDemo = new Harness({
// ...
modes: [
// approve the plan and the session flips to execute
{ id: "plan", name: "Plan", metadata: { default: true }, agent: planAgent, transitionsTo: "execute" },
{ id: "execute", name: "Execute", agent: executeAgent }
]
});Observational memory
Finally, the crown jewel: memory. We've talked a lot about why compaction sucks. Mastra's industry-leading Observational memory solves this by keeping structured observations instead of compacting to a lossy summary.
As the conversation grows, an observer agent compresses messages into observations; once the observations pile up, a reflector agent condenses them, merging related items while remembering what matters:
import { Harness } from "@mastra/core/harness";
import { Memory } from "@mastra/memory";
import { LibSQLStore } from "@mastra/libsql";
const harnessDemo = new Harness({
// ...
memory: new Memory({
storage: new LibSQLStore({...}),
options: {
observationalMemory: true
}
})
});Get started
Harness is available in @mastra/core@1.43.0 or later. Visit the docs to get started.
