Excited to announce that Mastra, the Typescript agent framework, is moving into beta.
We’re a lot of the former Gatsby team, and we’re building Mastra (https://mastra.ai), an open-source JavaScript agent framework on top of Vercel’s AI SDK.
The backstory here: Abhi Aiyer, Shane Thomas and I were working on an AI-powered CRM but it felt like we were having to roll all the AI bits (agentic workflows, evals, RAG) ourselves.
We also noticed our friends getting stuck debugging prompts, figuring out why their agents called (or didn’t call) tools, and writing lots of custom memory retrieval logic.
At some point we just looked at each other and were like, why aren't we trying to make this part easier. Here’s a demo video
One thing we heard from folks is that seeing input/output of every step, of every run of every workflow, is very useful. So we took XState and built a workflow graph primitive on top with OTel tracing. We wrote the APIs to make control flow explicit: .step()
for branching, .then()
for chaining, and .after()
for merging. We also added ..suspend()/.resume()
for human-in-the-loop.
We abstracted the main RAG verbs like .chunk()
, embed()
, .upsert(),
.query
, and rerank()
across document types and vector DBs. We shipped an eval runner with evals like completeness and relevance, plus the ability to write your own.
Then we read the MemGPT paper and implemented agent memory on top of AI SDK with a lastMessages
key, topK
retrieval, and a messageRange
for surrounding context (think grep -C
).
But we still weren’t sure whether our agents were behaving as expected, so we built a local dev playground that lets you curl agents/workflows, chat with agents, view evals and traces across runs, and iterate on prompts with an assistant. The playground uses a local storage layer powered by libsql (thanks Turso team!) and runs on localhost with npm run dev
(no Docker).
Mastra agents originally ran inside a Next.js app. But we noticed that AI teams’ development was increasingly decoupled from the rest of their organization, so we built Mastra so that you can also run it as a standalone endpoint or service.
Some things people have been building so far: one user automates support for an iOS app he owns with tens of thousands of paying users. Another bundled Mastra inside an Electron app that ingests aerospace PDFs and outputs CAD diagrams. Another is building WhatsApp bots that let you chat with objects like your house.
We (for now) have adopted an Elastic v2 license. The agent space is pretty new, and we wanted to let users do whatever they want with Mastra but prevent, eg, AWS from grabbing it.
We believe that any developer should be able to build and productize a human-level agent or assistant. It should be as easy to build an agent as it is to build a website.
Beta release notes
To recap, agents are a layer on top of LLM calls that maintain state, making decisions, and using tools to accomplish tasks. Think of them as stateful workers that can reason about problems and take actions.
Mastra Agents have access to tools, workflows, and synced data, enabling them to perform complex tasks and interact with external systems. Agents can invoke your custom functions, utilize third-party APIs through integrations, and access knowledge bases you have built.
Here are some things we added in the beta:
generate()
and stream()
APIs
We introduced generate()
and stream()
to simplify LLM calls:
-
generate()
returns a single, synchronous completion along with metadata (tokens, usage, etc.). -
stream()
provides partial outputs in real time for chat-like interfaces.
Both methods work with Mastra’s agent architecture and OTel logging. This keeps your prompt, memory, and tool usage all in one place.
Advanced agent memory
We have added a number of different backends, as well as memory compression:
-
Hierarchical Memory Storage: Organize context in layers (recent, mid-term, long-term).
-
Long-Term Compression: Summarize older context while preserving key details.
-
Vector Search Memory: Embed historical data and retrieve relevant context on demand.
This setup prevents memory bloat, speeds up retrieval, and keeps agent context focused.
Tool registry support via MCP
We added a support for tools under MCP (Model Context Protocol):
-
Unified Registry: Declare all tools in one place instead of wiring them manually.
-
Access Control: Restrict which tools each agent can use to maintain safety and permissions.
-
Visual Listing:
mastra dev
shows each tool’s methods and parameters for quick reference.
See the Tools guide for details on how to configure MCP in your code.
Workflows: orchestrate complex tasks
Most AI applications need more than a single call to a language model. You may want to run multiple steps, conditionally skip certain paths, or even pause execution altogether until you receive user input. Sometimes your agent tool calling is not accurate enough.
Workflows let you not only control the general flow of task-execution but also add checkpoints, moments when computation is suspended (so a human can provide feedback or guidance to the agent) before the workflow is resumed and ultimately completed.
We’ve built several workflow patterns in Mastra that you can add to projects and customize.
Better workflow control flow APIs
AI engineering is building production-ready ETL pipelines and should be described with dataflow nouns and verbs. Users say things like “it kind of feels like how I’d have done if I rolled my own”
They look like `workflow.step()` for step creation, `workflow.then()` for chaining, `workflow.after()` for branching and coalescing branches.
Suspend/resume
We also introduced a suspend/resume mechanism that allows you to pause a workflow partway through, gather additional data or human feedback, and then continue exactly where you left off. This is particularly valuable for tasks that rely on asynchronous user interaction or third-party API responses that may arrive minutes (or even hours) later.
RAG (Retrieval Augmented Generation)
Retrieval Augmented Generation (RAG) is essential for grounding AI responses in factual data. Here are some things we added in alpha:
Standardized APIs to process and embed documents
We noticed that every team was writing their own document ingestion code. So we introduced clear patterns and example code for parsing, chunking, and embedding text from various sources (PDFs, HTML, Markdown, etc.). You can define chunk sizes and embedding configurations in your mastra.config.ts
, which makes it straightforward to integrate with vector stores like libsql, Pinecone, or pgvector for retrieval-augmented workflows.
Chunking and embedding strategies for optimal retrieval
We’ve introduced multiple chunking strategies—including semantic chunking (where we break up text by semantic boundaries rather than just raw token counts) and sliding window approaches. Plus, Mastra will help you keep track of overlaps, so important context isn’t lost in mid-sentence breaks.
Reranking
Finally, we added a reranking layer that sits between your vector search results and the final output. Mastra will take the top-N results and run them through a reranking algorithm to reorder them by likely relevance.
With reranking, you can drastically improve answer correctness without having to guess at hyperparameters in your vector search. There’s a built-in reranking function you can toggle on in `mastra.config.ts`. Or you can provide your own custom logic.
Mastra dev: from playground to prod
Mastra dev is your local playground for experimenting with agents, tools, and workflows in one place. Spin it up with a single command to interact with your agents in real time. You can observe each step of the agent’s decision-making, review prompts and responses, and debug function calls on the fly.
This setup lets you quickly iterate on prompts, workflows, and integration logic.
Mastra dev includes:
-
an agents playground so you can chat with your agents
-
/generate
and/stream
endpoints for each of your agents so you can test them via curl or call them over the network -
visual workflow diagrams for each workflow
-
endpoints for each of your workflows so you can test them via curl or call them over the network
-
a registry of all of your tools
Here's Shane demoing Mastra Dev.
Built-in evals and prompt CMS
Evals are automated tests that evaluate LLM outputs using model-graded, rule-based, and statistical methods. Each eval returns a normalized score between 0-1 that can be logged and compared. They can also be customized with your own prompts and scoring functions.
We built a @mastra/evals package for systematic agent evaluation, including metrics for answer relevancy, completeness, and tone consistency. We also integrated that with @mastra/core for standardized evaluation hooks.
We added them to the `mastra dev` console too…. [see the gif I tweeted]
Mastra Create
Build projects in a standalone way using npm create mastra
.
This will create a new project scaffold with directories and (if you want) examples. We've found this to be a cleaner DX for new users.
While you can still initialize within an existing project (mastra init
), mastra create
generates a ready-to-use project scaffold—complete with recommended directories, configuration, and boilerplate.
Getting started & sharing feedback
We're very excited to see what you'll build. Please npm create mastra@latest
and tell us everything you love or hate.
Next up
Our blog and Twitter accounts are good places to find the latest on Mastra. We're also constantly updating our docs and adding new working code examples to help anyone get started with Mastra.
Lastly, consider joining our workshop series. Every week, we'll be tackling a new piece of the puzzle: agents, evals, RAG. We'll build something in under an hour.