Why Fireworks uses Mastra in their agentic runtime

Fireworks AI is an inference platform that runs open-source models and processes over 140 billion tokens daily for companies like Uber, DoorDash, and Cursor. Founded in October 2022 by former Meta engineers Lin Qiao and Dmytro Dzhulgakov who built and ran PyTorch, Fireworks recently closed a $52M Series B at a valuation over $500 million.

Rolling their own framework

In September 2024 Matt Apperson, a staff software engineer at Fireworks, was tasked with figuring out an agentic vision for Fireworks. The goal was to build a hosted agentic system with the control and programmability requested by Fireworks' customers.

"I'm a big fan of using state machines for agentic flows," Matt explains. "I think they make a lot of sense, but they're not quite perfect, they're a little verbose." The team was prototyping a three-layer architecture: a state machine foundation, an agentic runtime on top of that, then an API layer.

But finding the right foundation proved challenging. "There wasn't really a lot out there that was not Python based," Matt recalls. He felt that existing solutions like LangChain were "either too much or too little," and they didn't allow other pieces of the TypeScript ecosystem.

So Fireworks started building their own. Matt was 75% of the way through implementing a custom state machine system when he found Mastra.

Discovering Mastra

Matt discovered Mastra while browsing GitHub to get a lay of the land. He found Mastra and was immediately impressed: "You guys were already using XState, which we were too."

The timing couldn't have been better. "Mastra basically handed us that first layer in a nice neat little bow," Matt explains. When he found Mastra, he threw out the state machine he'd written.

Using Mastra workflows as their foundation, the Fireworks team built AIML, an open source framework allowing developers to build multi-step agentic systems using only prompts with special XML tags.

"Instead of sending just text as a system prompt to Fireworks API, you can send these AIML style prompts," Matt explains. "It just uses declarative XML tags within a prompt just like you use XML normally in a prompt, just now they are more functional."

AIML workflow visualization showing state graph with Incoming Request, Think, Answer, and Stream Response nodes

The system works by parsing XML-tagged prompts, breaking them down into document order flows based on SCXML state graphs, then dynamically composing Mastra workflows to execute the logic. The result: complex agentic workflows that can be built and iterated on without writing code.

"The problem we're solving is that product teams often work in TypeScript while AI experts work in Python," Matt explains. "With AIML, you don't need to convert between languages. It's a whole lot easier to hand a prompt back and forth than it is to take a Python notebook and convert that to TypeScript."

AIML code examples showing Simple control flow, Real-time streaming, and State & Context features

The Mastra team worked closely with Matt during development. "The biggest pain point was the concept around being able to pass values from one step to the next," Matt said. The Fireworks team had wrapped Mastra workflows in a class to automatically pass values between steps, and shared this feedback with the Mastra team, who "implemented it quickly, but not just like, slapping it together, but really kind of thoughtfully applying the feedback from a first principles perspective."

Early traction and focus shift

Post-launch, some of Fireworks' largest customers have begun using AIML, and using Mastra allowed the team to shift focus from infrastructure to user experience.

"It allowed us to take a step back and stop worrying about the internals," Matt reflects.

The team is currently working on real-time tracing capabilities and preparing for broader production use.

Why Fireworks uses Mastra in their agentic runtime

Rolling their own framework

Discovering Mastra

Early traction and focus shift

Stay up to date