AI Agents Hour
Join Mastra cofounders Shane Thomas and Abhi Aiyer for weekly conversations about the latest in AI.
They discuss breaking AI news, chat with guests from the industry, and go deep on the technical challenges of building AI agents.
Listen on:
Latest Episodes
April 2, 2026
#76
Anthropic Leaked Their Own Source Code, OpenAI Raised $122b, and Axios Got Hacked (This Week In AI)
Shane and Abhi bring you your weekly roundup of AI news! Claude Code's entire source code leaked via an exposed .map file in npm — 512,000 lines of TypeScript, 50K GitHub stars before DMCAs started flying. What people found: Claude Code uses ~20 tools, and there's a regex that silently logs user frustration to analytics. Same week, a CMS misconfiguration exposed a draft blog post revealing Mythos and Capybara — a new model tier above Opus described as posing "unprecedented cybersecurity risks." Fortune separately confirmed a source saying Opus 5 is "so good it poses a danger."
March 26, 2026
#75
Claude Uses Your Computer, Openai Buys Python Tools & The Cursor/Kimi Plot Twist (This Week In AI)
Shane and Abhi kick off with a viral quote: if your $500K engineer isn't burning $250K in tokens, something is wrong. OpenAI is acquiring Astral — the team behind uv and Ruff — joining the Codex team. OpenAI bets on Python; Anthropic bet on TypeScript with Bun. Then Cursor drama: someone found Composer 2 is powered by Kimi K2.5, Kimi confirmed it, and raised another $1B at an $18B valuation — three rounds in 90 days. Anthropic shipped Claude Code Channels (Telegram/Discord control), Cowork Dispatch (persistent agent, message from phone), and a deep dive on how they use Skills. Matt Pocock found quality drops past 100K on the 1M context window. And 52 million views on enabling Claude to use your computer — Mac only. Stripe launched MPP for agent-to-agent payments. Better Auth launched the Agent Auth Protocol. Cloudflare shipped Dynamic Workers for AI-generated code in isolates. LangChain open-sourced Deep Agents, Composio shipped 30-parallel-agent orchestration, OpenCode lost its Claude Max plugin after Anthropic sent lawyers, and Netlify and Google Stitch entered vibe coding and design. EsoLang-Bench: LLMs score 85–95% on standard benchmarks but collapse to 0–11% on esoteric languages — memorization, not reasoning. Quick hits: GPT-5.4 mini/nano, Minimax M2.7, Morph FlashCompact, AI CMO, Letta pivots to coding agents, GLM-OCR, LiteLLM supply chain attack.
March 24, 2026
#74
Email Broke Productivity - It's Time To Fix It (with Brett and Naveen from Micro)
Brett Goldstein and Naveen Sreekandan from Micro join Shane and Abhi to talk about why they believe the future of productivity looks completely different from what we have today. Micro is an all-in-one productivity platform: email client, CRM, calendar, tasks, docs, meeting notes, and a powerful AI agent, all built on a unified graph where every object (like emails, people, companies, meetings, documents) is interconnected. The thesis is simple but bold: email isn't just a list of messages to get through. It's the world's most-used CRM, travel app, hiring tool, and developer notification system. Micro restructures that data so each use case actually feels like the right tool for the job — your sales pipeline as a Kanban board, your GitHub notifications as a task board, your contacts fully enriched from every email and meeting you've ever had. Brett walks us through the demo: the daily orchestrator automation that audits itself, updates its own prompt, generates your day plan, and has even prepped talking points for this interview. Context docs let the agent know everything it needs. The CRM auto-fills and auto-updates from emails and meeting notes. The X integration lets the agent pull recent posts from anyone you're about to meet. Naveen covers the architecture: built on Mastra, using agent and workflow primitives on top of a graph-based data model backed by Postgres with a custom query layer called Prism. One main agent with dynamic context injection handles both chat and automations — the agent knows whether it's in automation mode (just give the output) or chat mode (ask follow-up questions). Supermemory powers vector search. Dedicated sub-agents handle specific workflows, such as email labeling and meeting note summarization.
March 20, 2026
#73
Two Lines of Code to Lock Down Your Agents - Mastra Studio Auth
Mastra Studio started as a local playground for developers to test agents and workflows without having to spin up a custom UI. But as the feature set grew, teams started asking: how do we share this with non-technical teammates? How do we control what different users can do? Ryan, an engineer at Mastra, walks through the new Mastra Studio Auth — now baked directly into Studio. Starting with simple token-based auth (two lines of config), you can lock down your Studio from the open internet. From there, RBAC lets you map roles to granular permissions — 80 auto-generated permissions derived directly from Studio's routes and handlers, controllable via wildcard patterns. Out-of-the-box providers include WorkOS, Auth0, Supabase, Firebase, and Clerk, with GitHub and others in open PRs. The team also discusses what's coming next: audit logs so you can see exactly what an agent did, why it accessed a given tool, and whether it should have. Auth for agents in production isn't magic — your tool files still need to check permissions — but Mastra handles the plumbing so you can focus on building securely.
March 18, 2026
#72
NVIDIA GTC, The Death of MPC, and AI Agents Are Hiring Humans - This Week in AI
Shane hosts this week's news from his usual studio while Abhi joins remotely from NVIDIA GTC 2026 in San Jose. Jensen Huang's keynote set the tone: NVIDIA is doubling down on AI factories, pushing 100x more token throughput, and helping bring OpenAI onto AWS infrastructure.
March 12, 2026
#71
Meta Acquires Moltbook, Openai Releases GPT 5.4, TypeScript Is #1 on GitHub (This Week In AI)
A lot happened in eight days. Meta acquired Moltbook, a social network built entirely for AI agents, not humans. OpenAI dropped GPT-5.4 Thinking and GPT-5.4 Pro, Codex got forks for multi-agent workflows and Windows support, and there are rumblings of OpenAI building a GitHub alternative. Anthropic fired back hard — multi-agent PR code review for Claude Code, while loops via /loop, the Claude Marketplace, and a way to pull your context from other AI tools.
March 10, 2026
#70
The Biggest Threat to AI Agents (with Ismail Pelaseyed)
Ismail Pelaseyed from Superagent is back on Agents Hour, and this time he's talking about something most builders aren't thinking about yet — supply chain attacks on AI agents. Guardrails protect against what you tell your agent to do. But what about everything your agent reads, fetches, and installs on its own? That's the gap Brin is built to fill.
March 4, 2026
#69
Missile Strikes Disrupt AWS and Claude, Anthropic Banned from US Government, Cloudflare vs Vercel
This week in AI saw geopolitical turmoil, major funding news, and a shift in software development. Missile strikes in the UAE and Bahrain disrupted AWS and Claude services. Meanwhile, after Anthropic banned its models from autonomous weapons and mass surveillance, the Trump administration banned Anthropic from government contracts—posing a major supply chain risk. On the same day, Sam Altman secured a deal with the Department of War as OpenAI announced a $110 billion funding round, highlighting a sharp contrast in approaches.
March 1, 2026
#68
How to Build Reliable AI Agents with Datasets, Experiments, and Error Analysis
Yujohn from Mastra explains why datasets and experiments are essential for building production-grade AI agents. If you're building an agent, you need a way to verify it's working correctly before and after you make changes. Datasets provide that baseline. You create a collection of test cases (ground truth) that represent the scenarios your agent should handle. Then you run experiments: pass each test case through your agent and measure the results. This is error analysis in practice. You start by identifying where your agent fails, then build scorers to quantify those failure modes over time. Smaller teams often ship first and add datasets later, once they have user feedback. Larger teams need them earlier. But eventually, every production agent needs this. The demo shows how Mastra makes this accessible. You can create datasets through the UI, add items manually or import from CSV, and run experiments with a single click. The results show you exactly what went wrong: which tool calls failed, what the agent output was, and how it compared to ground truth. You can also compare experiments side by side to see if your prompt tweaks actually improved things. And because all the data lives in your own database, you can write your own agents to analyze the results, dig into traces, and iterate. The SDK makes it easy to integrate into CI/CD: run experiments on pull requests, gate deployments on eval scores, or just collect data from production and curate datasets later.
