Opus 4.5 vs Gemini 3, Agents in Slack, Phone a Farmer, Tony from Runloop - 50th episode!
It's a showdown between Opus 4.5 and Gemini 3! Today we discuss our first impressions, what we are hearing, and how the new models stack up on the benchmarks. In the future, will there agents in every slack? How is the Mastra v1 coming along? What other AI news is going on? Abhi and Shane talk with others from the Mastra team to find out! We also chat with Tony from Runloop.
Guests in this episode
Watch on
Episode Transcript
Hello everyone and welcome to AI Agents Hour. I'm Shane. I'm here with Obby. What up?
And we're not doing this on Monday, although we normally do. Today's Tuesday, but it is a special day today because it's our 50th episode. Who would have thought? Yeah, dude. 50 episodes deep.
Yeah. Who would have thought we made it this long? I'm I'm still I'm still surprised. And we still, you know,
clearly if you watching, we barely know what we're doing, but we're here. We show up every week. And so hopefully you all do as well. This is live. So if you're watching it live, you're on X,
you're on, you know, YouTube, you're on LinkedIn, even drop comments. We will chat along the way. But we do have a pretty awesome episode. But I thought before we dive into that, do you have
any highlights from the last, you know, it's almost been a year. It's been not the better part of a year for sure. 50 episodes. What's what's the vibe? How are you feeling? What what's your favorite moments?
Uh we've got to talk to a lot of great people working in the industry with us. many of them friends and some some of them never met in you know never met ever right so it's been really cool the condom episode definitely great last week's episode was good uh we've been having some bangers you know yeah I I I do feel like it they're getting they're getting more fun at
least I don't know if they're getting better but I'm having more fun doing it yeah they're definitely more fun and I think by by definition I think are getting better for sure yeah it's way more fun because in the beginning we just live streamed random stuff. We didn't have like a show structure which is
nice. Yeah. Yeah. Now we actually, you know, we have a semblance of an agenda when we come in. We we try to try to bring in
interesting people, you know, from across across the space to teach us things so we can learn from so we can ask questions and yeah, we talk about some Yeah. some some of the drama of course that's always fun, but also just the general news and and what's going on. Did I found out that my parents watched this show and when I saw them like this
this weekend? No. So, did you uh did I'm guessing your first reaction was, "Oh maybe I shouldn't swear as much." That's exactly what they told me, dude.
That's exactly what they told me. They were like, "Hey, you know, you shouldn't swear so much." I knew it. I don't even know your parents, but I know your parents. Yeah, dude. That's exactly what like you
can Exactly. Dude, that that's kind of funny. I told him I can't change who I am. You know,
you got to lean into it, you know. I guess you know, if if you're listening in the chat, what do you think? You know, what's your favorite moments from the 50 episode series so far? Um, you know, I still I
don't know. I I remember obviously I think the guests are are the coolest part because we just get to talk to people and it's a little bit different than of course we met some of them in person. we see them at events, but it's also nice just to chat and but you're doing it in public so you can ask them some things and they'll usually like the
amount of things that they'll share openly is is always really interesting because I think I feel like whether it's you know Professor Andy or Ally some of the recurring guests but also you know some of the people that we just talk to once you just learn a bunch get to ask some interesting questions I think that's one of the more fun things and then just like yeah talking crap about
all the AI drama is always spawn in the news. Hashim says, "The MCP drama episodes." Oh, those are recent. Yeah, those are in the last last couple months.
Last month or so, actually. Uh, did you hear how the MCP debate went? I didn't actually tune in. We had Alli on last
week. We talked about the MCP debate. So, speaking of MCP drama. Yeah. So, the Let's just say it was
inconclusive. Um, as expected. Yeah. Inconclusive.
So both sides won. So yeah, I mean both sides got got some attention to attention for sure. You know, it's like, you know, I don't want to say it's like politics, but in some ways it's like if you already came in with a certain thing, you just supported the person that had your opinion, right? And
and I think they were probably I'm guessing they were both pretty measured, right? Like it was probably pretty civil debate, but yeah, I think you kind of just like side you're like, "Yeah, that was a point for us. That was a point for this side. So yeah, I can't wait till the episode. I
want to see the full episode. Um, but the they were talking on, you know, on X like MCP is here to stay. So yeah, where was it going to go? I don't I don't think it's going away for sure. Uh, unrelated to the 50th
episode, there was a another big MPM attack. Oh, dude, this one is so good. Dude, you just I just saw the link where you could refresh and see all the different repos that were getting like just Yeah, I can talk about this one. Yeah, I didn't do too much research other than I know there was a ton of big
names that were impacted, which is is never good. So, for people listening, you know, like there when these things happen, it's called like a supply chain attack. What is a supply chain attack? is when like
an attacker hits different like processes of your app like it could be your CI/CD pipeline it could be production anything that it's essentially like hijacking keys and then running stuff um but how do they do that they have to infiltrate you somehow and the attack vector are mpm packages because they're a dwarf star right there's so much croft in mpm people are downloading modules from all over the
place so what you do is you you impact a smallest module somewhere in the dependency graph that everyone has to install and then you can get into like backdoor channels there because when npm install runs it can run pre-install scripts and post-install scripts where the attacker then just jumps in and steals all your Uh damn. I was promised I was going to cuss this
episode but I already screwed up. But um that's you made it you did valiantly. You did like 11 minutes. I did pretty good. I did pretty good. So, this particular
supply chain attack has been ongoing and it's called Shy Haloop from like Doom. So, it's I like it's cool name and honestly it's very like smart. So, we have like I have a link. I'll just put it in the chat. Y'all can go geek out
over this cuz sometimes I don't know sometimes I feel like evil or something because I'm thinking about it from like the attackers perspective. I'm like damn that's very like how clever are you? But then all these people are compromised and there's probably a bunch of like credential theft happening for anyone using some of those modules. So that's not good.
Yeah. I mean it seems like this stuff happens more and more. You know it's not it's mpm obviously is a big player but there's this happens all the time across different different attack surfaces.
And I guess that's why, you know, security is important, but even, you know, even when you think you're secure, you're probably probably not as secure as you think. Yeah. That's how you get your like your app held for ransom and Yeah. Which happens.
Does it happen? It does happen. Uh this does remind me, you know, and we have a guest coming soon, so we're we're going to bring that guest on. But I do like related to the
like I've never h I've never actively held anything for ransom, but I did play a prank once where I did. So my old old boss long time ago, not the most technical person and I like we're running a small development team. We're like, we should just play a prank on him. He's he's a fun guy. Like you know, we had a good time. We had a good relationship. So, I
just like logged into his printer and I printed out like a message like a like a Macaffrey anti virus message and I basically said like your network's compromised like here's how you have to do this. And he was freaking out and like took it seriously. He came in and showed me and then I like took I pulled up CNN and then I like obviously just
like modified the HTML because you can do that and he doesn't know. I'm like yeah like it's already on CNN and like our name is on there. Did you know this? It's like the name of the company. And he was like losing his
mind for like half an hour. He thought he was like the company's done. Like the whole thing like we didn't have the like it was a lot of money. Like I don't know. Terrible prank to pull on someone. That is a really bad one, dude.
So I'm not the most proud of it. I was young. I was young, techsavvy, and dumb.
So you know, Carter, if you're listening, I'm sorry. It was very funny for the team, though. The team loved it.
You did not. Uh, I don't pull pranks like that anymore. So, someday I will be on the receiving end. Karma's gonna come
back and get me for that. I I know. But anyways, uh, with that, uh, again, this is live. Leave us some comments. Leave us some chat messages. If you're not watching this live, if you're
listening later or watching on Spotify, listening on Apple Podcast, please give us a review. We do like the fivestar reviews. If you don't want to leave a fivestar review, please find something else to do and don't leave a review at all, you know. But we do appreciate the
five stars. Today we have Tony from Run Loop coming on. We're gonna a new segment I'm calling phone a farmer where we uh we bring on uh Ward from the team who's our resident farmer. Oh yeah. We're going to talk about an agent in every Slack with Grayson from the master
team. We're going to talk AI news. Obviously there's some big ones. There's some big things to talk about here
between Opus Gemini and a whole bunch of other things. And so we're going to have a whole bunch of people from the Monster Team to celebrate. It's the 50th episode. We want to have a lot of guests from the team that maybe haven't been on the show for a while.
But should we call in the first guest? Let's do it. And I say call in like they're calling on the phone, but uh you know, just click a button and and he'll be here. Tony, what's up? Am I here? I'm here. Hello.
You're on the show. Welcome. Thank you for the welcome. I'm I'm honored to be on the 50th episode. My
goodness. Yeah. You know, it's it's it's a big one. I'm hopefully we'll, you know, we're going to get dozens of
views on this. And by dozens, could you imagine? Hopefully views for the 50th episode.
That's that's it's written in the stars. Well, yeah, let's get 50 reviews for the 50th episode. If you've been holding out on waiting on reviewing us or or you know, this now is the time. Even if you
don't, here's the thing. Even if you don't watch it on Spotify, you could still give us a review. just go like watch part of an episode and give us that review. If you watch it on YouTube, we could still appreciate, you know, the Spotify or the Apple podcast reviews.
So, let's get 50 reviews. That's a good goal. That's a good goal. Thanks for coming, uh, today. Uh, Tony,
what you got for us? Yeah. Yeah. First of all, maybe maybe tell us a little bit about yourself. Obviously, we we chatted at the TSA comp
what, three weeks ago or so now, and we kind of set this up, but tell us a little bit about yourself, a little bit about Run Loop, and then I'm sure we have a a lot we can talk about. Yeah, absolutely. Um, you have to give some context for all our viewers. Uh, so I'm Tony. I work at Run Loop as a software engineer. My background is kind
of in like full stack engineering. I went to Waterl in Canada. Now, you know, living in San Francisco where the weather is a bit more suitable to my preferences. Um, at Run Loop
specifically, we build kind of just like a sandbox platform for running agents on. Um, we think we're differentiated because like we have, I think, really good like enterprise scale as well as I think some tools around actually deploying agents, especially integrated with MRAA that we're going to show off in a demo later today. Um I was thinking
of yeah showing a little bit about the MRA studio like working with an agent locally or even an agent on run loop like making a um code and dev boxes and just seeing how those previews might look. How does that sound boys? I mean great people people that watch the show love to see demos. Yeah. Yeah.
It's it inspires a lot of ideas and and things that they can pull into their work. So I think that sounds great. Before we do, before we do, how how long you been at at Run Loop? I've been at Run Loop for almost the
math for just about six months now. Just about half a year. Yeah, I think it's really cool. I think
working in like an AI company like Mastra, working in a company like Run Loop, I think you're like on the cutting edge of this kind of stuff. Um, at least like with at within Run Loop, we do use a lot of AI to kind of write whatever code that isn't Kubernetes. um I I think the models don't quite have enough training data on Kubernetes so far to
write like perform like you know actually um working code but we we'll get there soon that makes me feel better because if the models can't understand Kubernetes how the hell am I supposed to you know so it makes me feel a little better that there's you know there's other smart people that can handle handle that
not it's not me yeah or you know the knowledge hasn't been added to the training sets yet it's It's hidden somewhere. We'll we'll see if they can find a way to unlock it. Yeah, it's in a it's in like a few dozen engineers minds, maybe a few thousand, but Yeah. Yeah. Yeah. Tony, I've been, you know, I've been
feeling that sandboxes has become like a very popular term over the last couple weeks. Uh, and it just keeps coming up, different providers. Also, you know, the big boys magically having sandboxes themselves. Can you like explain like why do people need to care about sandboxes? because I I do think it's an
important topic in like AI engineering. Yeah. Yeah. The high level is just like
if you I think like embracing AI, you know, wholeheartedly is like something that I believe in, but you just have to I think like set some expectations around like what it can do. You know, just like the idea of like having the the like like shy holoon worm and like npm something that's like you know used
everywhere. If you could imagine like how agents work, they're like making like sometimes like hundreds of tool calls in the background. Um there's a lot of like work that people that you know bad actors are doing around like injecting code, injecting kind of different prompts and essentially you would like the agents outputs to be um
sandbox in a way where they can't possibly affect any other files. So, for example, like I don't know how how you two use your agents, but like I I I really like to be able to use like auto approve or to like like use yolo mo just like not approve anything because I think the models are good enough. But like running that way I think like
really requires a sandbox just because like it's you're basically like like one accidental CD away from like destroying code from making like destructive changes and just having this like idea that um we have a container for the code. We have a specific context that we're working in and then being able to
like use that mental model for like keeping agents like limited to their scope is I think sandboxes. Yeah. like a really good thing for sandboxes. Yeah. I mean, another way that I've always thought of sandboxes, another
good use case is if you you want your agent itself to be able to write and then execute its own code as well, right? It's like you can kind of limit the d the surface area of of damage that that that agent can cause because it's limited to kind of that that sandbox environment. Um Yeah. Yeah. And then also it's like especially if then even if anything happens you just you know delete the
sandbox spin up another one spin up a few. Um the other like direction for this too is like part of run loop's idea is that um as we get to this like this era where people are doing like more fine-tuning like you like thinking machine labs is like trying raising at like another like crazy valuation to
like support their like fine-tuning API. Um, we think that like it's worth testing like a single task on like multiple different agents perhaps and like having one sandbox per agent just like really helps you like compare and contrast like specific outputs from different agents and then you can even compare them simultaneously if you're running in a sandbox way. If you can like do this kind of with git work trees like locally and then like with like
like like local sandboxing but like at a certain point you run into constraints with your own computer you run into constraints with like having to manage that context yourself. I think just like this idea of like a sandbox in the cloud is like a scalable and kind of more like digestible like infrastructure paradigm at least for human brains. Are more
users using the sandboxes as like an agent will spin it up, execute whatever things it needs to do within the sandbox and keep it up or like close it off or whatever. Do people and in the converse of that, do people run agents within sandboxes? Yeah, we're we're it's interesting because I think um Run Loop is like an infrastructure company. So I I talk a lot with the customers that use Run Loop as a platform. We're we're seeing like a
divide like some people like really don't believe agents should run in the same place as the code. So the code stays on the sandbox, agents lives off the sandbox and then people will write code to the sandbox and execute code there. Well, other people really believe that that just adds an extra step of latency. They want to like ship fast.
They want to keep the sand agent on the box with the code at the same time. So I think we're seeing both versions of it. Uh we'll see which way comes out ahead. My personal pick is running the agent on
the box to really get rid of any possible latency. Then the agent can actually like use tools on the box because it's it's just like like a micro VM with Linux tools. Yeah. Um I think part of like what makes just
like things like cloud code um part of makes like just these like CLI these agents good is that they can just use like shell scripts on the actual sandbox itself. And having those be called faster, having the inputs return faster, I think speeds up development. No, that's that's really cool because I think that's where the conversation is
kind of headed too with sandboxes because along with sandboxes code mode this whole code mode concept is also gaining some like experimentation right people are trying it out which is also sandbox as like a critical infrastructure there for that if you think about it I also see a lot of similarity between sandboxes and RL environments as well there's like you know fine tuning and
training. It's not exactly the same, but there's like all these different things that are starting to pop up where there's enough similarities that companies are going to start, you know, kind of cross cross-contaminating, if you will, into these other areas because they just they're so uh they're so like parallel, right? They're so close to
each other. Yeah. Yeah. Especially in the context of Asians, too. like part of I think like we really
think at Run Loop that maybe the the sandbox itself is going to become kind of like commoditized so we're just going to be like racing to the bottom. Um but around that we kind of built out like this idea of like a benchmark tool. So if you can have a reproducible sandbox environment and then you can swap in different agents so you always have like a defined task at input you change your
agent up you can really set up like more like multi-stage multi-turn valves as well. So once maybe like the sandbox infrastructure is like just expected and we kind of move towards that field then like the tools around the sandbox tools around the agents I think will be the more interesting discussion for sure dude for sure
because like the sandbox is a primitive y it's very similar in you know like models as well like models are you know a lot of models are starting to coales on some of the same features right but now they're adding tools and stuff on top of the models to make the models more agentic sandboxes are primitive
same thing adding tools even like with MRA MRA is an agent framework agent frameworks you know there's a there's not that much difference between uh the frameworks is actually a good thing because they're starting to coales and we have similar APIs we can do similar things but the tooling around the framework is
starts to become more interesting then so I think that is a seems like a natural progression yeah absolutely and then especially like it's It's the tooling. Then it's it's an interesting discussion too because like I'm using Monster Studio as like a human, right? Um and and the tooling works excellent for a human. Then then like at a certain point I think we start talking about like how does the tooling
help agents when you have like agents writing new frameworks and like what's the best way to expose like the only the relevant information to agents and then like that tooling I think then to start like picking your audience versus if you're writing for agents or writing for humans. Who's who's to say if if you you like trust those AI timeline things that
might be happening as soon as like 2026 you know like early next year. There's just a lot of discussion about like parallelism parallel parallelism in cloud codes and you know companies like conductor that are trying to speed up developers. Um, I met someone last week that has two computers that runs, you know, however many that they can on two
computers, which is honestly honestly just diff that's interesting. But, uh, that's why I just think, yeah, the sandbox comes, it all comes back to like the sandbox primitive being able to just run compute on for untrusted code. But anyway, should we see your demo?
Yeah, let's take a look. We're doing it live, so hoping it If it doesn't, the audience understands. It's a live demo. We've all been there. We've all been there, right? Um, okay,
great. I'm like zooming in. Um, so I got two kind of versions of show of this, but the the core is just showing like um the master studio using an agent on a run loop box. I'm going to try this. Can you click Can you click zoom? You
know, do zoom in another click or two. There you go. That's a little bit better. Amazing. So, you see I've got um this like fancy
wacky URL. It's because I'm trying to run Monster Studio from within a dev box. Um the hope is, you know, talking about this idea of like having agents use the tools. If you keep Monster Studio in the dev box and an agent can
be operating on Monster Studio. So I'm going to be the agent here today. So um sick.
I've got this idea of just like I want to just like see a simple app with a front end and show it on the tunnel. So these like tools around interacting with the with the sandbox I think are going to be really interesting. Um, Run Loop has a way like if you see this tunnel, this dev box is running inside run loop. Here
we have a second box that's just spun up right now by the agent and we can see that it's like running our tools, creating a sandbox, writing some files. This is a concept of like so we like we're not we have one agent that's like living um like let's it's on box A. It's spun up a second box and now it's executing all of its code onto onto
this like second box. And does this work? What's it looking at? We have quick output. So it's it's set up a
NodeJS using express set up a basic front end made a tunnel. Let's go to the tunnel URL. Interesting. 502. No story of my life. Let's see if we can debug it.
The amount of times I've seen an engine X502 error in my life doing webdev in the p in my in my past. This is This is like PTSD from like 15 years ago for me. Yeah. PTSD for sure. Running running my own engine X. Uh
literally on my own uh server underneath my basement stairs, you know, just right all kinds of dude. But cloud the Cloudflare 502 is the new EngineX 502. It's so true, isn't it? Um what what's cool is I think um part of how this is
done is I I added like tools in Ma. So, I made a create tunnel tool that is set up so that it links um the sandbox ID and then a specific port to expose. This is something that basically allows like a little bit of like a human in the loop in in the situation.
We'll see if it tries to like fix itself. Here's hoping it fixes itself. But what's cool is you can take a look at the actual um commands being run here. That's awesome. What one thing we can
take a look at is if it can't actually find express, then maybe that's the issue. So, let's see if we can we can tell the agent that'd be good if you could imagine installing Express. Yeah, we have a coding agent in Monster Studio right now. It's pretty crazy. I like it. But the the what's interesting is this
idea that like it's it's in one box. It's running on the second box. Let's see if we we fixed itself.
Hey. Yeah. Lau. Let's go. Dude, this is the future. I can I can
see the future and how to develop like this, right? Because you could do so much stuff in parallel, too. Yeah. Yeah. Yeah. I I think some of the more interesting tools that I added this as well is like maybe something like I
like this app. Dude, Tony's coming with the heat on the 50th episode. Yeah, this is this is legit. This is really sick.
It's exciting. So, what's cool is like this is another tool that I wrote that linked into like run loops APIs. What we're doing is we're creating a disk snapshot of this sandbox. So we have what is like a
template for running future sandboxes from. Um, one of the flows that we see that's like really interesting as like a developer is if you have if you like are taking like multi-steps to like finishing a feature, you can take a snapshot at a specific time and then like have your agent maybe run multiple versions like variations of this feature
and then compare and take the one that you like the best. That's cool. Yeah. Let's see. We should
have a new snapshot. I made sure it should tag itself as a master snapshot, too. Nice. A simple web app
from Mastra with a source sandbox. Well, look at that. Amazing, dude. That's legit.
Yeah. Um, the other thing that we think is cool in terms of like we have studio running on the cloud, but like sometimes you don't actually need the studio running, uh, I have here we have like our called our our agents tab. So like MRA has a way like you you can run master build and you can like export everything into like one like bin one
directory. So uh in run loop what you can do is you can upload your agent like from a directory and then what it'll do is you can start then a dev box with a directory and then it's takes the the master like framework that you've like set up. It takes your agent, takes your tools, and loads it onto a dev box immediately. And then you can then like
after you're done tinkering with the studio, start like hundreds of dev boxes, thousands of dev boxes with the agent that you tuned here. So you can, you know, build a simple app web builder like at scale. Very cool.
Cool. So that would just run my MRA agent and I could hit it through the API if I wanted to or connect with it anyway. Yeah. Yeah. Yeah. One one flow that we see is like um like on run loop
specifically you can SSH into like a box directly and then like inside the web terminal you could like run or you could run like node um like CLI.js for like running your master agent. Awesome.
Yeah. So what did we show? We showed Monster Studio running inside of Devox. We have Monster Studio running inside
this dev box. Monster like then made a devox itself and then made a cool little web app that we can see here. We made took a snapshot of that devox.
So we can make more versions of the web app if we want to. Or we can even we can take a snapshot of this studio kind of fork that off into different ways of into like other studios as well. What other cool things can I show? We we also had this idea of like you can do human in the loop as well. Um so we're using
like VS code server. Let's see. Let's hope this works. Um but this just
installs like code server onto the dev box. We had like um network controls that you can enable so to control ingress egress just depending on like how um how you kind of want to manage your agent. I think the one thing that we're going to see is as well is like making sure that you limit ingress maybe
like only from specific specific origins as like agents get more complicated to kind of like manage the security aspect of it. Can we see? Oh, amazing.
I'm glad this worked. So, this is cool because now like we have our express server here that is actually on the like running live. So in terms of like a human in the loop flow, we can like just go edit the index and say like saving. We should see if we refresh. Hey, that's cool. Yeah. Yeah,
dude. That's that's legit. Yeah. Yeah. Excited. I'm happy this works.
But I think it's it's uh like as as someone who like works with AI and like just sees it, you know, every day. It's really cool just like I think mapping what are like human actions that I would do to maybe like build something like this like literally like as a human I would be running commands in my brain to
write code writing files and then like checking like self and then being able to map that like in Master tools and then see that kind of be shown in like a physical form and then have that just like beat on infrastructure that runs in the cloud. I think all these are just like concepts that just blow my mind every day. Yeah. Yeah. It's legit, dude. It's really cool.
It's the future, too. It is the future. Anything else you guys want to see?
I'm blown away. Yeah. No, I think if you know to catch people up, if you're just tuning in, Tony from Run Loop showing us his code agent that, you know, runs in a Run Loop dev box and creating snapshots and all that. So, if you have questions, please drop them in the chat. Uh, Tony, how do
people if someone wants to play around with something like this? What would how would you recommend they they get started? Yeah, I mean, I think master makes it really easy. If you download master
locally, I just create a new master app. Once you're done creating a new master app, I would say you could launch you could just upload an agent on Run Loop after you sign up at platform.runloop.ai.
Um I I was testing this as well. Pros and cons. Uh we have public blueprints that are available. We're going to make
a run loop forward/mra template available so you can like build build a new dev box with MRA studio on it and actually like see your agent running live in the cloud as well. I think that might be you know two different angles if you want to develop on the cloud versus you want to develop locally. And
then as soon as you have something working I would say build it load it onto the agents tab and start running boxes with it. Yeah, and I and I think there's definitely a room for us to to take the tools that you are you have set up here and figure out how do we get that as a template for someone who wants to build with MRAA but give their MRA agent
access to all the all these cool run loop tools. Yeah. Yeah, for sure. And just like talking
about this too, like we have this idea of like just building coding agents because I think like thinking about things like problems as code is pretty easy. But I I mean I saw the agenda a little bit as you're going to talk about like Opus and Gemini moving forward. I think these agents are just better at taking like non-coding problems like
planning a trip or something and then bringing that into like framing that almost as like a coding problem and working through it. So when you have sandboxes like in as well as like executing code, there's a way in run loop where you can bring in um files via our object store. So if you were just running a specific like report, you don't want to have your agent accidentally delete data in prodat
like the link drive or something. You could like have it work through specific tasks in isolated environments. And I think once like this this inside the coding space like diffuses out and people start using agents like in like non-coding arena like areas. This could be a really cool like extension of the
MRO framework and a really cool application of it. Definitely, dude. Definitely.
There's a lot of use cases that are starting to come out of the like in the in the real world, let's say, not in our coding agent world. Yeah. And it's still really all about documents and files and like that, you know. Yeah. Turns out what what is the thing? Uh my my role in life was to turn
unstructured data into structured insights. Thankfully, we had agents to do that for us soon enough. Yeah. Um, awesome. Well, Tony, how can how can
people follow along with you? Um, I'm available at uh tonyrunloop.ai.
Um, Runloop if you sign up for us at runloop.ai. That's ru nl.ai.
Uh, we you get 50 credit $50 in credits for computing running in boxes. And um if you have any questions, you can send me an email and I'll get back to you right away. Hopefully you guys build something really cool.
Yeah. And uh we got Juan in the chat. Can't wait to try this out myself.
Yeah. Troy Jam says, "Look solid." Congrats. Thanks. Appreciate the comments,
everybody. All right, Tony. Well, we'll have you come back on again in the future and show us other cool stuff. So, if you
ever are launching anything cool at Run Loop, you have a place to to show it here on the show. We'd love to have you back. Yeah, absolutely. Thank you, Abby. Thank you, Shane, for both your time. Yeah, dude. Enjoy us the 50th episode.
Yeah, good to see you again, dude. All right, later. That was dope.
That was a fire demo. Fire, dude. We're bringing straight fire. Yeah.
Well, the show must go on. And this is uh this is one a segment I've been uh looking forward to because we haven't we haven't had Ward on for a while. And so we're call we're calling this uh phone a farmer, but what's the what's the context? What what are we doing here?
What are we doing with Yeah. Yeah. What phone a farmer? Why why are we calling in farmers on an AI agent
show? Well, you know, because farmers are a level above principal engineer and uh we have a lot of mra things that we want to talk about in engineering. So, you got to bring the guy who's above principal.
Yeah. Farmers farmers over principles. That was farmers. For those that didn't see the meme, there was a meme that says
like if you graduate from principal engineer to farmer and Ward happens to be I guess both. You know, he's he's so he's the founding farmer engineer. The founding farmer. Ward is the founding not the founding father. He's
the founding farmer of the founding farmer. Ward welcome. It's a title they cannot take me take away from me. Now the founding farmer uh one of the founding engineers of Ma. But yeah, we wanted to
bring Ward on because we have we announced at the TSA comp uh three weeks agoish now that 1.0 was coming. We have we're have one v1 in beta right now. And
I thought it'd be a great time to recap the progress, recap the plans, how are things going, what can people expect, and yeah, just talk about MRA in general. So, so yeah, I can I probably forget a couple of things because we are change we're like making so much progress on 1.0 in general, but in like in hindsight
or like how do you say it? basically we're just making Mastra more mature like we had so many um APIs experiments all those kind of things and now we're basically found more or less all the APIs we want to have and then we can deprecate all the old things and I would say we're just trying to make the DX well enough because sometimes we call
something like get agents sometimes it's like list memory um those kind of things and now they're all like one thing But if we want to go through the list, uh there's so much. So I think the biggest ones are um I think the biggest ones, let's see, scorers are now like evolves are not there anymore, like deprecating them, and we're now calling them scorers, like a way better API in
general to test um all the LLM stuff you do with it, like agent workflows, everything. Um what else? like the storage layer is now going to be composite if that's the right term in English where you basically can choose what storage you want for which layer like before it was basically you attach it to master and
then everything was using it. So now you can basically say I want observability to use this storage. I want um my messages like be stored in something else because different different um entities need different storage options like observability doesn't really make much sense in Postgress. So why would you
store it there and not in like a click house or any other um whatever it's called like a not relational database like a higher throughput database because yeah you've noticed with observability it's very rightheavy right you're sending tons of traces so might want a different database for that um yeah what else I think we're doing a big push
for like AIDK and just client GS in general eneral we're backporting a lot to 0.x. So what we've seen is the client SDK that we have was basically a fetch client. So it was just wrapping rest
APIs and now we're basically making it identical or trying it to make it as identical as we can similar to master core. So if you do like um generate or stream on the client JS, you get exactly the same output as you would do it on the server side just to make that DX way nicer. We're also like putting a lot of effort in the AISDK use chat hooks.
There's still some uh problems or people like uh testing things out and say like hey this isn't working. So we're just adding a lot of more things because MRA has more features than AICK today. Like I think last week we um made a pull request to support trip wire. Like when
you basically do a request to an AI agent and you do like input processor output processor you want to maybe stop the um output or input to reach the LM or the user and um AIK cannot do that today. I believe maybe 0.6 six can do it, but I don't think so. So, you're basically um sending like a custom data
uh part to the um to the user instead of crashing the uh use chat hook basically. Um what else? So, so we got we got a question that's probably worth you know at least briefly talking about and I'm I'm happy to to start but feel free to jump in either of you. So what time frame would you say is realistic for the stable VRUN v1
release? Well, the first thing was, you know, we we want to make sure it's right. That's the most important part.
So we it will definitely go out when it's ready, but we are anticipating, you know, originally we were aiming for end of the year, but I think the plan right now is probably around the first to second week of January, but a little bit TBD on that. Yeah, it's my guess, too. um with all the holidays like uh this week is like
holidays for the US and then Christmas is coming too which is both US and EU but I think it's might be bigger in EU because um we don't take days off now so we take them off at Christmas so uh so I think it's just the timelines I think it's better to say January and I think we're almost there we're basically at like paper cuts I'm pretty sure almost all
breaking changes are There might be like one or two left. We have one breaking change left or I think we have two we have two major breaking changes left. One is moving mosa memory to be a processor. So you can so I mean that's a that's just for
future stuff. Doesn't really impact anything for our user. But technically if um our memory is in a processor you could use memory in other places. Um, so
that's one thing and also it once memory moves to being a processor, it makes it easier for you to roll your roll it yourself if you needed to, right? I mean, it's just a processor. So you could take our memory, extend it, you know, create your own, use it as an example. So we we think our
memor is really good and we're making it better, but maybe your specific use case needs something slightly different. And so if it's just a processor now, you can, you know, you have full control over it. So it definitely gives you more control. Yeah. Use case I got from some of our um users is that they want to get
information from memory or working memory, but they don't want it to be saved in like the um messages in the thread again. So it just wants like give me the information and then use it for maybe a tool call or maybe just pass it to the LM LM. but they don't want it to be stored again um as part of the the
history basically. So that will be possible with the memory processors. So Eric says yes I'm evaluating Mafra and we need it to work with our AI memory platform. Well you should be able
to take our example and just use your memory as input and output processors. That's what you know input and output processors are commonly used for guard rails. It can be used for memory. It's honestly like tools around the context
engineering you need to do and that's what memory does. So Eric, hopefully you should be able to do that today. But uh yeah, if if you have trouble, you know where to reach us.
Yeah. And honestly, I think that's the main like the most noteworthy. Um I don't know if you should talk about master server already. Um or that's for another day.
Uh we can talk about it because it's part of V1. Go ahead, Ward. You can talk about it if you want.
Um, so what we've seen is with Mafrade Dev and Build, we're doing a lot of bundle work to host on like serverless like Forcell, Cloudflare and bundling is like a very hard problem and there's so many different packages in the world like very old ones, very new ones. The older ones mostly give us problems and for some people it's a stop gap. They can't get the build to work.
So they can't get master studio or the master project running which is like very sad. Um so we're basically building like master server um which is a standalone version of the like all the APIs that we have. So for example the get agent the stream all at the same like you get the whole suite that is um
compatible with um studio. So you don't have to like create the endpoints yourself. And then a cherry on top is you can we're it's like adapter based. So you can use like hono which we use by
default or you can use express or nestgs or build your own. So basically it's a it's a a way to don't use the bundler basically if you don't need to and most enterprise or if you running in docker or your own infra there's no real point to bundle um because you have enough space or you can attach enough space um to host it because the bundling is only
necessary for like lambdas where it's 250 megabyt is um like the max I don't know what cloudflare is but if you have like on local LLM or just some binaries you go over pretty easily. Even if you might not use it, it might still be part of the node modules. And with bundling, you can basically remove the unused code. Yeah. And it also opens up if you have
your own bundling process, right? It just now you don't now you just use it with whatever you have. So you don't have to worry about having your bundler and then our bundler and making them all play nice together. So I think it's
going to be a huge huge escape hatch for people that are running into, you know, they they want Ma, but they don't want to deal with all the bundling nightmare that comes with just sometimes Typescript packages that have to play nice together. Yeah. And then I can get back to my life.
Yeah. We call Ward the the bundle for a reason because there's, you know, very few people that would even want to try to do what what he tries to do. Yeah, dude. The bundle.
I don't know if I want to. Yeah. Well, now you won't have to anymore, dude.
Now you won't have to because it's just like here, oh, you're having issues. Here's here's the the escape hash. So, you have full control and you can bundle it or not bundle it. But if you want to bundle it, bundle it however you want.
Yeah. Do whatever you want. And then all those uh those issues start to start to go away.
So, like back to the V1 thing, the question Sebastian had, it's like these two these like these are big things that are part of V1. Um, and once those are good, we'll feel more comfortable. On another thing, the whole team is essentially destroying every bug that exists. Um, and we have goals for that.
So, anyone listening who's helped by opening PRs and everything. I see y'all in or issues or issues opening issues. Thank you very much because we're going hard on trying to get to that count down to a very low number. Yes. Yes. We definitely appreciate the
the issues, the PRs, all the help. Like Sebastian is in Discord and is very helpful. So, thank you, Sebastian. He's watching the show. That's pretty cool. Yeah. And if you're not on our Discord,
join it. That's a It's a fun place to be. Go hang out in our Discord. The master.ai has a link on the website, so
join us. All right. What else? Anything else we should chat about, Ward? So, Ward, did
you buy that farmhouse right next to your house? I made an offer, but it's still not uh accepted, but I don't think so. This is news. I I have not heard I knew we talked about it when when I was in
So, Ward is a farmer in Belgium. You know, I'm not going to try to pronounce the city because I can't. Don't dox. I get Yeah, I don't want to anyways. It's not that big. But Ward is a farmer
in Belgium. You were So you you made an offer on that other farmhouse next to you, huh? Yes. Um they put it on sale I think two weeks ago. So um I made an offer
immediately that I want to buy it. Went to the bank and everything. So I'll be in depth even even more. But uh I think I made a good offer, but we'll see if they accept it. There's not that much
people looking at it because I al already think I paid too much for it. But they like at least in Belgium they're saying you always pay too much for your neighbor's house. Well, yeah. Ward, there there's a
strategy in the US like anytime someone comes and looks at it, you just have to seem like you're an unruly neighbor and no one's going to like, "Oh, I don't want to live next to that guy." So, a little little sabotage. Just litter put litter everywhere like run naked in my backyard doing the horses. I don't know about that. That might that might get you in jail. But yes, there
there are many ways, you know, if someone comes and talks to you, you're like, "Yeah, my my goats just make so much noise." You know, they're just making so much noise all the time. I don't, you know, like, "Oh, I don't want to be next to someone who has goats that are making a lot of noise." I'm hoping.
Oh, nice. Yeah. Conbrink says, "You guys are really good at responding to issues and take feedback well." So, you're doing a
good job on that. Thank you. And Sebastian says, "Not participating in a lot of communities, but the kind of activity in the Discord and the way you integrated into your development cycle is very pleasing." Thank you, Sebastian.
Thank you. And Cohen Brink is Volcano on uh on Discord, which we all know. Thanks, dude. See, you know that we're in the Discord when we know you by your Discord name.
That's tight. All right. Thanks for watching the show. And now I have when I if I can get the
house, I have to like refurbish it a bit so Obby can stay there and doesn't have to go to the the local hotel anymore. I mean, I do like the local hotel, but you know, but I next time I visit, I would I would love to have a have a place to hang out. Yeah, that's me sick, dude. I'm stoked.
Dude, if you get the house, Obby and I are going to come together. That's that's the plan. I'll be there. Let's Let's
You know how much I spent on hotels in the last four years in Belgium in that same hotel? Like I'm probably their best customer. You're keeping them in business. Yeah. I think that sometimes I'm the only one there in the
whole big ass mansion bed and breakfast place. No one else is there. So they probably know me around there. I'm the only brown guy walking around, too. So
that's that's true as well. Well, yeah, we got 1.0 coming up. Yeah, thanks again to the community for helping us get there. Still a lot of
work to do. So, we're not there yet, but we are making a lot of progress. You know, there there's a migration guide, I guess. Ward, would you recommend, you know, people migrate now? Should they wait? How should they go about migrating
if they want to try it? What's the what's the next step? I would try I would try. Today, we have
code mods as well. Um so basically you run a command um it's in the migration guide too and it tries to convert all the like small things already. So like get agent is get agent is now list agent. It would do those things for you. Um I
would just try it on a on a new branch. Normally it's I think it would just go smoothly. You might have to change like one or two things. Um,
so here's the, if you're looking for the migration guide, if you go to the docs, go to guides and migrations, find v1, this has everything we need. Go to the um on the left stable to 1.x. I think
that's one actually better. No, don't have it until we re Yeah, there we go. Yeah, the actual one. So, go to the beta on the left. The beta is the 1.0
and go to the migrations and you can see all the different uh we should probably put like a message on the other migration guide. They should go to this one. Yeah, it should link. That's a good, you know. Yeah, there we go. We we have we we have a a
good action item from this call. Yeah, we have code mods. um because everything is very like syntactically changing. Um and then we also have an MCP tool
that Daniel and Leonard jammed on. So you can just have the MCP doc server just do the migration for you. Your mileage may vary on that. Let's be honest. But uh it'll definitely work
though. Yeah. And I think if you're on latest alpha or like latest 0.x already, I
don't think it's a big leap. um like the stream generate they're more more or less the same. Um so I would try it like if it doesn't work just go on with your live and stay on 0.x until we'll figure it out.
But but if but if you are getting very specific issues please let us know. We you know if you do have an issue we do want to hear about it. We want to make sure the upgrade path is as good as it can be. And you know if you do see any bugs we
want to we we like to uh sand out rough edges. So if you find the rough edges, please tell us. Y and I take a new bet that goes out today or trying to get it out today at least.
Yeah, dude. Freaking pesky test failing I got to fix. But yeah, other than that, we'll ship a 1.x today and then 0.x
tomorrow. Yep, dude. It's such a pain shipping 0.x, dude. Let's be honest, dude. We have to
back and we've broken stuff, too. So when we backport, sometimes the syntax has changed to the old one. You have to like kind of actually think about stuff, man.
For all you 0.x users, the things we do for you. I'm just kidding.
Yeah. I think last week I spent almost a day, I think, on backporting all the failed. That's my day tomorrow. I essentially have to backport all day. There's like we we ship so many PRs. I can't even
keep up. Yeah. We have a bot that tries to backport it automatically, but it always fails because it's so diver diverged already. So yeah, well we we are uh we need to we
need to get to 1.0 and then hopefully that becomes a little less common. Yeah, we'll keep serial alive for a couple of months or maybe longer for some security um updates, but we're probably not going to do much backporting then. Yeah. So, it'll be if you want the new features, you'll have to upgrade, but
security will of course remain important. So, Eric says, "Are we hosting or sponsoring any upcoming events in the Bay Area?" Uh, I actually don't know. We probably are, but I don't
know what they are. I don't know what the next one on the calendar is. So, we sponsor a lot, but I don't know if we're going to be the host. Yeah. I don't I don't think we're hosting any, but we will probably have some. You will probably find books at
our, you know, probably find some of these laying around at these events, but if you live in SF and you ever want to like hang out, hit us up. That that worked last time when we said that. Remember that? Uh we still haven't got a beer with that person though. But yeah, we we need to uh I'll be in SF in
a couple weeks, too. So, uh is the book in the local library as well? It should be. Yeah, we should figure out how to get into the local library. Donate it to the
local library. That would be really funny. Um, Ryan Carson's here says, "What's up?" Daniel just reiterated there's a
migration tool in the MCP doc server. Ryan said, "Upg upgraded to 1.0." Well,
AMP code did. So, must have went relatively well. Uh, Conbrink said, "Migration to 1.x went
pretty smoothly, but the main branch is still in 0.x." Um, Sebastian says on 0.x still but spins up one v1.
Yeah. So, seems like going relatively smooth for people which is good. Well, Ward, thanks for coming on the show. You know, you're you're the founding farmer of MRA.
Gonna change my slick slick title. Yeah, dude. That's your title, man. You probably should. Yeah, you it's went
from bundlei to founding farmer. I think that's that's better. That's a better ring for given the circumstances. You need a new
nickname. So, yeah. All right. Well, uh yeah, appreciate you coming on. Of course, we'll have you
back and we'll uh we'll see you later. Yeah. See you later, guys. See you, dude.
All right. If you just are tuning in, this is AI Agents Hour. I'm Shane. I'm here with Obby. We do this every Monday,
but if you're watching this live, you will realize it's not Monday. So, it's not every Monday, but it's usually on Monday. Today, it's on Tuesday. Usually right around noon Pacific. We go for as long as we need to go. And we
bring on really cool guests. Like today, we had Tony from Run Loop. We just talked to the founding farmer of MRA, uh, himself, Ward Peters, talking about V1 and the progress on that. And we also
are going to bring on the next guest. But before we do, please leave us a review if you haven't already. It's the 50th episode. We're hoping to get 50 reviews. That's a tall goal. So, we need each and every one of you. Please give
us a review. Like, subscribe on YouTube. All the things I'm supposed to say every time, and I always forget. But with
that, let's bring on the next guest, Grayson. Welcome to the show. What's up, guys? What up, dude? I don't know how I'm
gonna follow uh Ward, but I will do my best. Well, you got a sweet painting in the background. So, I do. I do. That is a That's something. You have that going for you.
It's it's actually my mom's dining room. It feels like it feels like a study room and like a library or something, but uh it's just it's a dining room, but it is a very cool painting. Traveling for the for the holidays, I assume. Yes. Yes. So, I'll be eating Thanksgiving dinner here in like 48 hours.
All right. Well, I'm glad you could uh take away some of the family time, but spend it with us. Maybe for those that don't know, because I think this, you know, you're relatively maybe you peaked in on one episode before, but this is the first time you've been really on an episode. I think I've jumped in the chat like I've I've watched y'all on YouTube
before and and jumped in the chat, but I think it's my first time on the stream. Yeah. First time, you know, first- time caller. they like to say, right? Longtime viewer, first time caller. Yeah. Uh, okay. Tell us about Grayson.
Yeah. So, um, yeah, I'm finishing up maybe my first five weeks or so um at Maestra, but I've known Shane and Obby and Ward and a lot of these folks uh for a long time from back in our Gatsby days. Um, just before MRA, I was working at a company called MFD. They did uh financing for manufacturing. uh is kind
of the simplest way to put it. So you want to make a product uh and you want to know how to make it for the you know the least amount of money most efficiently and then maybe you need some capital to get that order placed with the factory. Um and we built some AI agents that helped facilitate that process. Um here at MRA I'm just kind of
putting on as many hats as I can to get started but mainly helping help helping with customer experience. So whether that's your day-to-day open source user um up to our you know larger enterprise customers that have very specific needs for projects um and making making sure that they're happy and then also that
we're bringing back good ideas uh good patterns bug fixes back into the framework so the master gets better as well. Cool. Yeah. One of the reasons I wanted
to bring you on is because you you know one of the things you mentioned when you came here is uh something that I've been hearing more and more and it's that oh at you know MFD you had an agent that was in you know in your Slack and we you know one of the first things we did when we were playing around with building our own agent is Obby built an agent called
Dne who lived in our Slack and I've seen I was at a couple different conferences but I've seen this kind of emergent behavior where people follow and different people from different walks of life building agents. They want to like humanize agents. Get them in their Slack. Give them a name. Give them even
as much as like a persona. Like people are giving like personas to these agents like a backstory. Like I the amount of time I talked to someone, they said they spent hours just writing their agents backstory so it felt more human. And so
there's this like behavior. It must be something in psychology. I don't know how to explain it other than it pops up and it's not just it's not because people are talking about it. I think it's just a natural thing that people want to do. Yeah. Um I 100% agree. Like I there's a
whole technical side to this, but there's also a really huge philosophical side that I I find super interesting anyways. But it's like, all right, if we give our if we give stuffed animals and Pokemon, you know, these names and backstories, like some of these Pokemon have like tragic backstories, and you're like, it's just it's not even a real
thing, you know, like and we're doing the same thing with agents. Yeah. We're we're humanizing them. Um now I think
that one big difference is like developers and how we're thinking about agents we tend to be like oh this is like this non-deterministic function that returns you know the most probable result you know cool I I'll I'm going to hit this API right but if you go to like Jim in accounting hey Jim to to use this agent you got to curl this endpoint and uh you
know you know make sure you pull bullet and what like it's that is not how we're going to grow this. So if you're if you're a developer and you love this stuff and you're like and you can kind of and you're looking like me, I'm looking like 10 20 years out. I'm like this is this is the future. Um it won't be the future if we keep it in our own
little developer bubble and just build coding agents and and things that aren't accessible to the regular people inside of a business. And part of that is, you know, exposing it just like you would expose a colleague. You know, when you start on at a new job, you get an email address, you get uh you get a Slack account, you get all these things that
that makes you, you know, reachable, um makes you functional inside the company. So, I think the same thing will be true with agents. you I think you'll still have these kind of like faceless nameless agents of course that are doing agentic things. Um but as far as the
entries and like the gateway into the system um for most people it will be some kind of humanesque character persona. I think 11x named their agent, right? It was like Alice. Yep. Alice and I don't know Bob or something.
Who knows? But yeah. Yeah, they had two different agents and I think Alice was one of them. Yeah. And it's interesting like um you know
they talk about generational stuff with you know uh we you know elder millennial here like we hold a phone up like this but my kids my kids hold a phone up like this if they're kind of mimming the symbol but even other behaviors of um voice. So like I don't really like to dictate code. it just I I like to type, you know, but I've met some younger
folks that because they've had smartphones all the time that that can dictate pretty well, like that's kind of their um the first way they reach for that them and maybe some boomers do the dictation, too. But, you know, like the same thing will be true of how they think about AI. So, my kids are constantly yelling across the house,
"Hey, Alexa. Hey, Alexa." Like, it's just part of, you know, they know Alexa is not real, but like the they they they argue with her and they they talk to her like, you know, I don't know what the what the parallel is for us that didn't have Alexexas growing up, but there's something to that. And then I think we'll see it uh evolve even more with Yeah. I'm going to say this very quietly
so my my Alexa does not hear me, but how is Alexa so bad when this age of LLMs which are so much better? It's so terrible. Actually, I got an invite to um Alexa Plus, which is apparently I I haven't signed on for it. It was one of those like, you know, free now, but it'll kick
in and you have to pay for it later. But it's it's apparently driven by the more powerful models and good. I I would pay Iard pay for it. I would absolutely pay for it because Alexa is so terribly bad.
Yeah. I mean, I I now I've had multiple occasions in the last, you know, couple months where I ask it something simple, very simple things. I'm not asking a rocket science here and it can't understand me and then so I'll just pull open, you know, chat GBT in voice mode and ask it and of course it gets it
mostly right. Yeah. But it is uh it's kind of amazing that and this maybe the rumors around open AAI having a device and all this like this device should exist and it should be much better than than the hockey puck Alexa that I have sitting in my kitchen. Yeah. And it'll be a cool like um open
versus closed source thing too because be that hard to hook up a microphone and a speaker to a Raspberry Pi that you know can can do the voice uh dictation and then play it back. Someone should build the hardware device and it's open. You can deploy your own agent to it. Like that would be a really cool device is Raspberry Pi honestly. Yeah, I
saw I think I saw a company that's doing it. And then you can 3D print a cool case so it looks like a little character. Yeah, there you go. Oh, that's cool. And you can you can deploy this for experimentation purposes.
Yeah. Yeah. Like I I was reading some of the specs. I don't know if you can really run
depends on what you really want it to do, but maybe run a small model locally on it. Otherwise, you're going to have to just do network, you know, calls. But you could even do a local one. Um, if
you wanted to keep it simple. Yeah. Uh, we got a whole bunch of comments here. Uh, so Sebastian says, this is in reference to the reviews we were asking
for earlier. Not familiar with the review game. Well, let me tell you, Sebastian, if you uh want to give us a review, please go to Spotify, please go to Apple Podcasts. Even if that's not
where you're listening right now, you can definitely give us a review. If you want to give us a five star, we do ask if you don't want to give us a five star, you know, you can find something else to do. We're okay with that. Uh, but we we appreciate every review. And
Sebastian says, I can't wait to atmention in Slack to open the doors. And Hashim says, "Reading the reasoning an agent does makes it feel like a person. It get it gets confused about my codebase just like I do." There you go. Yeah, that's that's another interesting
point of like how we're thinking about these agents as human. I know everybody wants ASI, right? Like we want this machine that's just never going to mess up, but again to get a little philosophical like can can you make it greater than the sum of its parts? And the sum of its parts is us. And so like
as long as we are the source material is flawed which is us like can we really expect it to ever be um you know failure proof uh especially these days you know like you should the the mental model we used was a mad keep these at the like intellectual level of an intern um and the percentage of mistakes and the
severity of mistakes that they make from there and the responsibilities that you give them you know don't don't give them maybe they can't run pseudo in the shell like don't let them do database stuff like you know like that's what you would do for an intern right you know and so and if the intern screwed up you would
expect it and so I think just setting the right expectations is important and that I I imagine that will get better and evolve you know where we will be like okay we're bringing in a senior or something that is going to be running stuff but we can't really expect them to be much different from us at this point. Yeah, I
did see Nick slip in here. Nick, welcome to the show. How's it going? Good. Good to see uh got the got the
brothers hanging out. We're in Iron Manor right now. Well, it is it is that time of year in the US. It's it's family time. So, that that's good to good to see you. Welcome. Speaking of this ASI thing or even any
of this stuff, my dream has always been to have a house. Like, have you guys seen the movie Smart House? The Disney Channel original movie from back movie. Yeah. No, I think it's about you and about 50
others, dude. No, there's a bunch of people. If you're If you grew up in the 90s, you watched it. It's this house that can
make smoothies for you and clean and it's just it's like what the future could be. That's what I want. Yeah. I mean, I feel like Every 90s movie had some kind of intro where the
dad was some crazy inventor and it was making everyone's breakfast with this Rub Goldberg machine like uh like Flubber like I think Honey I Shrunk the Kids might have had that too like and then I remember a 60 Minutes episode with Bill Gates a long time ago like walking around his house showing off some of the first smart home features
which would be like you know a $9 light bulb now could do what was cutting edge back on you know warranted being on the nightly back then. But yeah, everybody wants to just be able to run that stuff. Yeah, my my version of that is uh when my daughter liked Mickey Mouse uh fun house or clubhouse or whatever and the the house was uh kind of magical. So, I guess that's there there's different uh
different ways uh that that that has come up in culture. Yeah. But yeah, I think that's that's the dream. How do we get how do we get a little more magic into our smart home?
How do we make our smart home a little bit more smart? Yeah. I don't know. Is that true? I guess so. Lavar Burton directed Smart House. I
don't know. You guys should watch Smart House, dude. It's really good. Say that's Thanksgiving homework for
everybody. I will not be uh watching Smart House, but some of you might. Congrat and let us know. Uh Grayson, did you did you you know you were been
working on something? Did you want to show a demo? Yeah, I can show a little demo. Um we can we can keep it quick.
Yeah, I'm gonna actually kind of work on this over the next couple days and then Alex Booker and I are going to be doing a workshop next week. Um Okay. Building this from scratch. Um
so a workshop Thursday, I think. Yeah. Yeah, that'll be my first workshop. So, I'll um it's kind of bits and pieces. So, I'll share Oh, you know what? I might have to
Yeah, I'll share a link. So, I'm gonna share a link to We don't have the link to the workshop yet. It's going to be announced soon. It's going to be next
Thursday. If you go to our Luma calendar, luma.com/mastra, I believe. I'll find that. Just you can
subscribe if you want to hear about our events. We have many more coming up and more that will be announced soon. So, I've got to I've got to restart to get the security permissions. So, if you would take 30 seconds to talk about
something, I'll be right back. Share my screen. I am good at filling time. Cool.
So, for those of you uh so there's a link if you want to see what events are coming up. Obby, you have an event scheduled I see. Is that real this week?
Yeah. Yeah, dude. I'm doing agent network workshop on Thanksgiving. Yeah, dude. Hustle, don't sleep. Yeah,
dude. That is all right, dude. I respect. Absolute respect. Uh, so for
those of you if you're, you know, here's the situation. You've had enough food or you're about to have a lot, you're about to have food with the family, you know, it's going to be a long day, but maybe you want to learn a little bit about agent networks first. That's that's where uh the workshop comes in on Thanksgiving. So, join Obby and Alex.
What? And if you're not in the US, you know, you don't you're not celebrating on Thursday, so it's just a normal day. Come hang out.
All right, Grayson, you're back. All right. Okay. So, I'll I'll preface this with, you know, the fact we're
going to do a workshop on this showing how you could build a Slack integration uh with Maestra. Um but also we as of last week we we opened up um a beta for people to volunteer to try this one-click solution in cloud. Uh so everything that I show you you could do yourself uh and manage that yourself or
if you're using Monster Cloud like now you'll be able to click with one button and add your agent to your Slack workspace which is pretty cool. Uh, so let me share and I and I'll share that form out for anyone after you see it. If you're like, "Yeah, I'd love to try this out." We would love your feedback.
Cool. So, can y'all see the Slack Maestra demo here? I do. Okay. So, this is just kind of like a
little toy UI I built. Um, I'll show you what it looks like in Maestro Cloud as well. Um, I've already installed one of these agents and there's a kind of there's a lot of like handwavy magic stuff going on to like simulate a real database and um, and do those types of things. But uh, actually, let me see if I can share my whole window. I think it's only showing
Yeah, we just see the the mouse trap. Let me try this again. There we go.
Cool. All right. Inception. We're good.
So the I just made three little like text agents, right? One of them just reverses what you said. One puts in all caps and one converts it to numbers. So I made a a little dummy Slack workspace
here, Mstronauts. And then I've got this code. It's a it's an Nex.js JS front end
that gives me this little UI so I can track this is just so I could see whether or not the agents were installed and then this is the Slack API um like apps dashboard where I can I can double check that things are installed but just kind of to show you all the moving pieces. Um so where to start? I I
I like the thing I like most about this uh is when I was doing more of the React development is kind of the framework wars like lots of different patterns and and ways of thinking about things and I I was always really attracted to the remix pattern where it was you know in React like oh it's just JavaScript and
it it felt like as thin of a layer as possible of abstractions to let you like just just write code and like we'll we'll augment it just enough to be awesome. And that's what I feel like Maestra did here, right? Like there's no real magic going on. Now, we're going to give
you some magic in MRO cloud to make it easier for you, but you can do all this and it just feels really natural. Um, the main pieces are uh the agents themselves, which you just write like your regular MRA agents. Um, and then I have to run an ingro here to be able to have a call back from Slack. So there's going to be like an an off portion where you have to give permissions um to allow Slack to install
this app uh in your workspace. Um and then there's going to be an aspect for um listening to events, right? So there's like a web hook listener, you know, here, right? So
here's all of our a stuff. in the work in the workshop, we'll we'll get into this and build it from scratch, but for here, I'll just give you the the highlights. Um, so let's just walk through it and see what it looks like.
Again, this is not the UI you'll get in Monster Cloud, but actually, let me go and show you what it looks like. It'll look like this. Nice and simple.
Connect to Slack, and you get the the OOTH flow, and then your agent gets added to Slack, and you can manage it from here. So, in this case, I've already added the reverse agent. So, if I go in here and I say, "What's up, Shane and Obby and Nick?"
I think I'll even see I should see all the traces in here. Yeah, there it was. Right. So, it actually hits our Mustra instance that's running because it gets
routed through the Enrock. Um, it's going to log out. We're checking what type of event it is. Um, and we have
some, you know, just kind of like some housekeeping stuff we want to do. Uh, if, for example, in Slack, when you tag someone, you're actually, it looks like you're tagging their name, but you're not. In the code, it's their user ID, and we don't want to um include that in the message. So, we're doing some stuff
like that. Um, and then we're also making some choices here. So in this case, we're choosing to respond in a thread, reply in a thread. So if I if I
come here, if you notice, it's it's replied here in a thread versus directly back into the message. So we're probably going to open up some more config to let you choose um some of these things um because essentially we're we're going to have this code running in Monster Cloud. So we can give you a UI and say, you
know, always respond in thread or only respond in thread um in channels. Um things like that. Um also, you know, this is just a little person icon. We
probably want to give some kind of logo so that people can more readily tell who they're talking to when they glance at a channel. Uh all kinds of things. Um maybe say you have a longer response. So some
of these agents that we built, one example was a sourcing agent. So again, say you want to manufacture something and say it's sunglasses. So you want to manufacture sunglasses and you give the the the agent some restriction. Say it
can't be made in China. I want it to be made in uh you know, India or Vietnam. And they have to have a shipping cost under this or a minimum order quantity of 10,000 units or whatever criteria you want to give it. And the agent would then go do some um you know agentic web searches. It would it would kind of
iterate and it would spit out this list of suppliers. So before that was somebody's that was somebody's full day. It would take them you know two hours or so to polish up a list maybe more of of suppliers but the agent you know could do it in a couple of minutes and then and it would respond um in the Slack thread. Um, and to kind of like expound
on this analogy, it would also put it in a Google doc because the business ran in Google Docs. Slack was great for communication, but things had to live in in Google Docs. So, it would give you the link to the Google Doc in a Slack message just like the the the sourcing specialist would have done. But, um, as
far as configuration, that agent was kind of like a hit it and forget it. When you respond back, like maybe you you don't want it to keep talking. Maybe you're now talking to another person and sometimes they get annoying and they're the the agent keeps kind of interjecting and you're like I'm not talking to anymore. You know, like a human would
get that the agent you we might have to configure some things like that. But also say that say that process takes five minutes. Um I believe I know we I've done this before. I got to look at how we could do
it with cloud, but you could give some UI indicators of what your agent is doing um before it writes a Slack message. So, um, again, I can't remember the APIs off the top of my head, but it would be kind of like a a spinner and say, "Hey, I'm searching for suppliers, you know, parsing suppliers, you know,
sorting the list." And then you would get the response um in case it's some kind of longunning uh workflow or task that um needs to be you'd like some kind of visual indicator of where it is. Um, so let's let's go through what the O flow looks like so that if y'all if any of y'all try this in cloud, you'll know uh
what to expect expect. So I've also got an all caps agent. I'm going to choose install. And so the first thing that you'll see is this um MRA app. So
essentially we have a parent app that installs the app dynamically for you. So we're going to allow the MRA app to install our agents for us. So, you'll you'll see a two-part um you know, login here. So, I'm going to allow that. And now it says, can we add the CAPS agent
specifically to your workflow? So, we're going to allow that. And it's cool because this pulls directly from your MRA agents name, description, etc. Um,
so if I go, let's see, it should have two of them connected now. Yep, our CAPS agent is there. And now our caps agent shows up in Slack. What's up stream?
So, so it's been added. Um, and if I go into my apps here in Slack and click on caps agent, it's the right one. Oh, I know where to pull up what I'm trying to show you. All right. So, if I go to CAPS agent and go to about, right,
this is from my master code. If I go to my caps agent and code, there's my there's my instructions. Um, and it gets added here. So when we make
this, the Slack apps are built off a manifest. And so when we create the Slack app, we're we're making that manifest dynamically from the code that we have in Maestro. And again, we can open that up. Maybe you want to have a dev thing and you want to rename it or you want to, you know, do do what you
want to override whatever, but we could give some we give some pretty good defaults out of the box. Um, yeah, dude. This is cool. So cool, dude. And then here we go. So here, I think
everyone gets the gist probably. And um, people are already asking about Telegram. I'm like, yo, dude, you got to calm down. No, no. I mean, I think or worse, Microsoft Teams. We actually
have a ticket for Teams. So that's that's on my list. I want to get this workshop polished up a little more and then dig into Teams. But hey, if you're listening to this and you're using Teams and you're like, "Oh,
this won't work for my org because we use Teams. Come come talk to talk to us. We will we want to make it work for you. So we want to figure that out. But this is a little bit of a choose your own adventure because
yeah, it's just JavaScript, right? Like the now and Obby, maybe you can correct me if I'm wrong, but I I'm using an X.js JS like API route here, but I think say your MRA server is completely separate from your front end or whatever. You don't have a front end. I don't know. Could you was this where register API
route would be useful? Yeah. So, you can add custom routes to the MRA server. Yeah. So, like that's so sick. Like ma
MRA made this really it didn't get in the way at all. It was just like, okay, you need OOTH routes, you need callback route, uh you need you need the web hook. And and and this is where if you're listening and and you're like, I would love this. Well, you choose your own adventure. You can come to the workshop
next week. Go to the Luma calendar. Uh you know, showing here. It's you'll have to subscribe to the calendar. The
event's going to be announced here in the next few days. You can sign up. You can register. Grayson will show you how
to build this yourself just on top of Ma. How to write the code, how to connect it with your Slack, customize it to do whatever you want it to do. If you uh don't want to write a bunch of code and you want to oneclick and just turn your master agent into Slack agent, well, just deploy to Monster Cloud and fill out this form here, which I'll it's
on our YouTube channel. Uh if you go to YouTube, you'll be able to click this. Otherwise, just like DM me on X if you're on X or if you're, you know, wherever wherever you can find us or go on Discord. We'll post this in Discord,
too. I don't know, Obby, if you have the link or whatever, or I'll jump into Discord here and post it. So, if you do want to just try out early access to our Slack feature that Grayson's kind of demonstrating, you don't want to write the code yourself, that's an option, too. Whole whole bunch of options for you.
Yeah. And so, really, yeah, any integration you can think about. Um, and going back to the smart home stuff, Obby, like that that's what would do it, right? is just just expose it. Expose different endpoints when
when you want to hit it, hit it and tell the agent to do something. Um it's it's a super cool pattern. Yeah. All right. Yeah. Thanks, Sebastian. Yeah. And I'm Jed says, "Hello a while
back. Welcome to the live stream." And Grayson, we need to wrap up because we're going long already and we have way more to cover. But we do appreciate you coming on chatting about, you know, a little bit on the future of what we
think interacting with agents is going to look like, but also the demo was sick, too. Cool. Thanks, guys. Have a good one.
Thanks. Thanks, Chad. See you. See you.
All right, dude. This is going to be a long show. It is. We got to do the news. We got to get to the news and we have
special guests for that. Uh again, one more update for anyone that's joined. This is AI agents hour. We do this every Monday or sometimes on Tuesday. And we talked
to Tony from Run Loop. We talked to Ward, our our founding farmer at MRAA. We talked to Grayson talking about how we think there's going to be agents in every Slack. A should be plural agents in every Slack. There'll be multiple, I
think. Uh, and now we're going to go on to my favorite part sometimes is just, you know, talking about what's going on. And we had some big news. But before we talk about the news,
let's bring on some other mastras. Yo. Oh, hey. Hey. Got Daniel here. We got you, John. We
got Nick there with Obby as well. So, we have a we have a a panel, so to speak, talk to talk about the news today. Oh, yeah. I just want to say before we
begin, um, for Grayson, if we can have a WhatsApp integration, that would be great. Um, I'm in this group chat with my family and I just want to do something like Grock or Eugon, is this true what my dad said? You know, something like this. That would be that would be great, you know, especially for Thanksgiving. It's it's in the family fact checker
agent is that's a good use case. Yeah, I'd do that, dude. That will start some drama that we definitely need to do. Yeah. What could go wrong there?
Uh, all right. Well, we got a big list of news items. I think we can maybe start with some of the big stuff first. And I don't know how we're going to do this because there's going to be a lot of people that can talk. So, if you have
an opinion, speak up. Hopefully, we won't talk over each other too much. But the first thing, maybe I'll I'll pull it up. Everyone's probably seen it by now. It's kind of feels like old news
to be honest, but it's not. It's only a few days old. And that is Gemini 3. Came out November 18th. It's
the most intelligent model. Helps you learn, build, and plan anything. The first release was Gemini 3 Pro. So,
here's the benchmarks. I feel like we're getting desensitized to all these these model releases. That's why like the the cycle so so quick now. So, it feels like it's it was
ages ago when it was a few days ago because we already got a new uh claw model too. Yeah. Time distortion. I think you know
just AI time is faster it feels. But you can see the benchmarks are good, right? It is higher than you pretty much everything compared to Gemini 2.5 Pro
which people really liked. So you have uh you definitely have an increase in all a lot of the benchmarks that that seem to matter. Has any All right. So we know it. We talked
about it. Gemini 3 is here. We have uh there's some state-of-the-art reasoning, understands and answers prompts with more depth and nuance, multimodal understanding, and it says it's the best model for vibe and agentic coding. So, can build
dynamic apps from a single prompt, and it's available in Gemini app, Google AI Studio, and it's in AI mode in Google search. So, there we go. That's the big the big announcement. That was the big thing last week. Has anyone tried it yet? And Ward, welcome back.
I I tried it because um I heard last time or two times ago it's the best one if you look at emotional damage. So I only use the best. That is true. As far as personality disorders, we uh Gemini has the has, you know, potentially the wor the worst or
the best depending on how you look at it. Um has anyone else tried it? So, I I just played around with it a little bit.
I haven't played around with it enough to go in depth to say if it's significantly better. I haven't really put it through its paces, but from what I've seen, it certainly seems like there was a lot of uh a lot of excitement and a lot of people saying like this is this is the best one yet. I use it for free on Open Code. They
were giving it away for the weekend or like last week. Um, and I don't know. I don't think it's that great compared to Claude, but then in general, it's really good. Yeah, I tried out their their IDE. I
downloaded it and then their signup was broken for a long time, so I had to talk about burying the lead. Daniel, well, oh, were you gonna announce that next? Go ahead and announce it. What are we
talking about here? Uh they Google released uh like a another VS code fork uh anti-gravity. So uh competitor to cursor and all that.
This is the launch video. So there it is. Yeah. the the launch wasn't that good. Um I I tried it with I tried the the new
model on it and I think after like 10 seconds I hit limits. I asked it to do one task and then it just stopped. The the other thing that was funny that I saw which I don't know if it was actually true but this was so for backstory Verun who I believe is the person in the that in from the tweet he was originally from Windsurf he was basically bought by Google couldn't buy
the company they were worried about antitrust issues so they just bought the founder and some of the IP somehow and they basically just relaunched it as ant you know I'm sure they they probably improved it. They made changes, but they basically left Windsurf out to dry. So, it seems from the outside looking in and
they basically relaunched it. And I saw a screenshot that said there's still places that referred to the agent as Cascade. Yes. Which is means they didn't even do a find and replace if if true. Yeah. Certain Windsurf logs are left in
there and stuff. Yeah. So, it's like you just took Windinsurf, you put a Google brand on it, you brought the the founder of Windsurf, you left the whole team behind and Yeah. not you know kind of sketchy
uh situation in my opinion. I feel bad for the team that was left behind but they ended up in you know in a good place. So all all maybe as well but yeah not haven't tried it. Don't not really
looking for a new IDE but curious if any you know sounds like Daniel you didn't have the best onboarding experience. Yeah, I tried it too and it was the same. Like I could sign up but then when I was using it, it was keep failing like a rate limit or I don't know what the exact error was but it couldn't
just overloaded or something. There was a follow-up tweet like later that day that there they just had so much demand or something that you know there was just a lot of like hiccups which I can understand. I think many people like even non-technical people tried it because their launch video and how they launched it was pretty good, right? Like yeah,
it's a good way to get people to think that your your product is really popular. Just say that it was uh too much demand. That's why I failed. I actually really liked Windsurf at one point in time. There was a time where I
I do believe that Windsurf was better than Cursor for a moment in time. And I think that that time was fleeting. It was like felt like three weeks was was their window where winds surf actually was better than cursor and I was trying them both actively and maybe it was just for me but I think there you know Obby I think you were the same way like we were
wind surfers for a bit so I do think that there's you know cursor's proven there's there's a market here I think there's obviously a lot of people trying to do it there's a lot of good options out there um I don't know founder served away yeah yeah I don't know if this will end up being yeah I suppose it depends on
how Google keeps committing to it and putting resources behind it. Maybe it'll get better. Maybe it'll become more competitive with cursor. I guess time will tell.
But assume if they put like the IDE in the in the browser, that would be huge, right? Because then like everyone can use it like it's basically what GitHub has what it's called like uh they have the CLI the dev experience in cloud too. So Google probably wants something similar because they want to do the whole background agents like GitHub and
everyone is doing too. So it's a good play. Yeah. As another related launch uh Sebastian mentions. Yeah. I think it's
is it called Gemini 3 Pro image or something? I think is like Yeah, it's basically nano banano. Nano banana. And yeah, that that's I I
mean I was excited to see that as well because the original was awesome. So if you can make that even better, dude. So many so many pictures came out after that of like people with Sam Alman like in all different places like the Golden Gate Bridge or you know at you know Yeah. And then also like the the
the legion of Elon and all the big AI guys like dancing or you know it was great. Let me see if I can find the that picture. Yeah, you should. Yeah, do that. All
right, we're going to continue on to the next big news item because, you know, there wasn't just one big launch in the last seven days. There was another model and I'm sure many of you've all you all have seen this but Claude Opus 4.5 which I was especially excited about because I loved the previous Opus and I you know basically let Claude code cook with Opus
many many times. So it's yeah again shows the benchmarks which I which I thought it was really cool that or maybe planned that Opus just waited until Gemini 3 Pro came out because they put you know Gemini 3 in the benchmark. So if you were excited about Gemini 3 and then Opus came out you you kind of look and you're like oh well I guess is just
better. Yeah they were sitting on this. Yeah, I feel like they were and obviously it's it's more towards coding tasks, right? Technical coding tasks is where Opus and Claude in general seems to really shine.
But is that is that the loop? Like GPT go open comes first, then Gemini, then Claude, and then the Chinese models. That is the Chinese models are going to be right behind, I'm sure, and they're going to say like, "Oh, look, we're only 1% behind or we're we're very close."
I think Opus pricing is cheaper, too, like per five million tokens. So, it's like they're trying to get the economies of scale and the price goes down. Yeah, they made uh before uh Opus used to run out like on your limits really quickly on the cloud code plans and now I think they made it so that it's the
same limits as sonnet. So hopefully we'll be able to use it without having to like switch off after a day or two. I think they make it the default model too. Yeah. I check
in cloud code. Yeah, in cloud code. Yeah, it's a default now. It's pretty good. like seems to work quite a lot
better and it seems to do better at long tasks now whereas before it would uh pause a lot and like ask for clarification or just think that it was done the task before it was done. But if they make it a default that means that or they just want more usage or it's cheaper for them to to run as well. Maybe they want us to spend money. True. No one changes defaults, right?
Most people won't. So, I was really excited to see this though. Cloud code available in the desktop app just because I think, you know, as cool as it is to just use the CLI, I'm comfortable with using the CLI, I can use get work trees when needed. I do
think that, you know, and and maybe this there's a whole bunch of like UI clawed code layers on top of UI that probably don't want to see this, but just having it in the desktop app makes a lot of sense. People already can run it locally. If I can send off, and I haven't tested it, if I can send off a bunch of different tasks across different repos and check in on it and
jump in if needed, there's really no reason it has to be in the CLI, right? So, I I will likely try this and just let report back because I just think it's a cool interface for having it write code for me. So, Opus is $5 per million input tokens, 25 per million output tokens, and then 4.5 sonnet, $3 per input, $15 per
output. So yeah, but I don't know what the previous Opus price was, but yeah, for $2 more you get more I think it was uh it's a third of the price. I think it was like 15 per million and now it's five per million. Dude, that's pretty cheap compared.
All right, there were some other things that I'll just rapid fire talk about from Anthropic. So they they launched some API features uh tool search tool programmatic tool calling and tool use examples. I guess let me just share my screen and show show that as well. Um so they have a tool search tool. So
instead of loading all tool definitions up front, you can have it discover tools on demand. So it can defer tool loading, programmatic tool calling, so you can orchestrate tools through code instead of individual round trips. So it's kind of like I guess code mode in a way, right? Yeah. Tool use examples
and they're all now available in beta. So some additional things for developers using the APIs. Yeah, that tool search thing is cool. Um, and I think it already, you know, it already exists kind of in the MCP world
for like Smithery, etc. If we can search tools from the toolbox. I wonder where is it like just I wonder the tool search is just for anthropic tools, you know, the built-ins. Yes, but this is maybe a good uh a good
segue because again we did, you know, gota keep the the ball moving here. We got more news. MCP. So this is from is
on the model context protocol blog, but it says MCP apps extending servers with interactive user interfaces. So today we're introducing the proposal for the MCP apps extension. And this is kind of a collaboration between it seems anthropic and open AAI and you know maybe some others as well. But what's
everyone's thought on this? I haven't read too much into it yet. So I'm hoping someone else has. It's gonna be so much
more MCP drama now. They're just doing too much. Yeah, I I kind of feel the same way, too. Like the majority of the spec isn't being used yet, and they just keep on
releasing more and more things. Like, they're just kind of throwing darts at a dart board hoping that different features will stick. But this kind of just seems like if it does take off, it's just going to be shopping and ads. Yeah.
But didn't uh Tyler shared another protocol? Forgot what it was something that too like with UI because I think this might make So this is built off of that. This was like it it took that for inspiration and then they made it like an actual protocol like or an actual uh like feature in the MCP spec cuz I think this I haven't read it so I don't know how good or bad it is but
having like some UI protocol does make sense if you want to share some stuff in VS code or in cursor now you're like you have to build for that editor if they even allow it and if you have like a spec then maybe you could just built the UI primitives and then the editor can do whatever it wants. So, Sebastian asks, "Isn't this
completely redundant to elicitation? I thought that was meant for building UI." I thought that was for state or something. Yeah, elicitation's like human in the loop for tools essentially.
It's almost like filling out a I always thought it was like more for like a form like you're basically like getting a schema back, right? You're like elicit eliciting. It could be for pretty much anything. Yeah.
This reminds me of like Netifi SDK. Remember we had to build that and like it had like a UI component that you could add different UIs. Oh, like the extensions. Yeah, the extensions. Yeah, it's kind of
similar. I guess makes sense for is it maybe it's just for configuration and stuff like that. Who knows? And and Daniel, funny enough, you said
this thing is just going to be used for shopping because there's another announcement from OpenAI introducing shopping research in chat GBT, a new shopping experience that helps you find the right products for you. So essentially, it's built in a personal shopper into ChatGpt, which this this might make them more
money than maybe anything else. Can you imagine? I imagine they're getting some kind of if they want to have people advertise or like place products higher, I imagine that you can kind of pay for placement at some point. If if not yet, then why, you know, why not someday? And they're going to make a lot of money either
referring products out or through like paid placements. And I I think this is going to be, you know, a lot it's going to generate a lot of money for them. I imagine just said put affiliate links everything so they all go to the CEO's Amazon's referral link and then he makes all the money. Yeah, I imagine. Yeah, I think it's going to be hell when
they uh start putting in like ads and stuff and like paid placements for people without mentioning that it's a paid placement. Yeah, I got regulated right now. Yeah, I gotta imagine they'll disclose it though. You certainly hope so. in EU they can't do it like they have to say
it's an advertisement if I could ooth with Amazon I would just I would use strategy to buy stuff why not yeah I do it now with with comet or basically with u perplexity I don't buy directly but I just ask you like find me the the cheapest place where I can buy this item and then I just buy it I mean it it does I use it for services a It it does make sense though, right? If
I can tell it like my preferences, I don't want to do the research around like which headphones do I need. I just want someone to tell me these are the best headphones for me. That's it. That's really like maybe I care what it looks like, show me some pictures, but
probably not. I probably just want here's my preferences. Find me the headphones. Buy me the headphones. Get
them at my door. Ideally in an hour, but you know, in a couple days is fine, too. I mean, that's the future we're going to be living in. So, American in an hour. It would be sweet if you can give your
your like preferences like, "Hey, I trust like Reddit and Wire Cutter for like their reviews. I don't trust any of that other garbage." Like, find me something from from those sources that like recommend this like that. I would
be I would be bought in. Exact. But what that's going to happen, right? That has that probably already ex you probably can use it. I'm sure you
can give it some preferences if it's saying receiving a personalized buyer guide and then it just one click buy, ships it right to your doorstep. Yeah. And then once you add in like services like find me like a plumber or something like that like Yelp it would be dead. Yelp dead. Yep. Yelp thumbtac
dead. like all those different and in that Daniel that's a amazing idea because as you know let's just imagine as a as a homeowner like you can decide like are you going to fix it yourself okay well you can YouTube it or you know ask Chad GBT to help you fix it whatever and it works you can you can get by with
through most things some things are going to be outside your skill range or you just don't have the time or whatever and if it can just recommend it for you can do the booking through it it can can't basically be the your your assistant that helps you with that process cuz that is a terrible experience. Yeah. All the communication. It's a it's terrible to find like to
like phone call someone to come fix my electrical and have to leave seven different messages, most of them will never call you back. And so if you could have something that would handle that pain for you, people would pay for that. I I've been waiting for a plumber to call me for like seven days now. Exactly. That is my point.
Kind of give you money, bro. like they just have so much of it, you know? They have so much of it they don't want they don't want anymore. No, oftentimes like it's small businesses.
They're they're busy. They you know, they don't they're not the best most well-managed, well-run. So, I think tools like this could help. I hope they are like the the credit
cards or something are attached to an API key. So then we can build an MCP that just steals people's money. I don't know where you're headed. You got to pay for the farm somehow. Yeah. Yeah. I got
You are Shyude, right? You're just telling people you're the guy behind them. Yeah.
Just watch out for those uh those people who want to steal your money through MCP. If you ever see Warts MCP, don't install it. All right. So, we have some more. Uh we'll try to rapid fire through some of these other ones. So,
um I think this is was just interesting. You know, our our friends over at uh at Verscell uh released AICK released a tool registry. So, collection of readytouse tools that add functionality to your agents with minimal setup.
That's cool. more tools for your agents, more tools for you to use in Maestro. Uh I don't think there's really that many yet in the tool registry, but I imagine the goal is for them to have more. So I
think, you know, thought it was kind of surprising they'd release it with only two tools, but maybe they they're trying to build the marketplace of tools. I I guess that was less clear, but it is cool to see. Any thoughts on that? I think it's good that if there's a language specific tool registry as
opposed from MCP, you know, I just don't know how you're going to get like a bunch of good quality tools. And then isn't it the same problem as like anything where you have to like now assess the quality of each tool that comes into the registry? Classic plug-in problem. Yeah, that that's a that is a tough that's a
challenge because you're going to end up with, you know, 10% or 5% that are good and then 95% that are really bad quality if you if depending on how they open it or you end up with like the app store and you have this now burden of reviewing and approving and you have an entire it's more of a app like app store type
of model where you submit to the the repo or whatever. Gotta make sure that Ward doesn't see your credit cards also. Yeah. Don't use any wards tools from the
registry. Yeah. If you see a Ward Peters tool, be be cautious.
Maybe it's like just enterprise play. If you have like a private MCPS, maybe they just want to put it all in one bill because now you use like another registry, but then you have two bills. Maybe they just want to have one.
Yeah. Yeah, I guess that comes to the MCP topic which will then it's just like if you're not doing multil- language things, would it be better just to have all your tools local in JavaScript or whatever language? Um, if it works that way. Yeah. But if you then use cloud code or
something, you kind of have to use MCP now like any you can add tools to cloud code SDK now. But then like you know it's still the same problem if you're you know if you're a big company and your tools are being written in Python and you're in the JavaScript side and all that type of stuff. So Quen announced that they have 10 million users on Quench Chat. Wow.
So if you're a fan of the you know Quen and some of the other Chinese model companies that's pretty amazing growth. Uh on top of that in related but not exactly the same news, this uh Ling Guan Vibe coding app hits a million downloads in 4 days. So Oh, this is real. Yeah, apparently. So, dude, we thought we was a scam in GitHub. Remember they're like they told
us that they' give away a billion tokens for the ant group. Did you see that? Oh, like two days ago.
That's what this was. Well, there you go. So, it's talking. So, essentially looks like it's a Chinese uh replet type
product and it got from zero to a million. It says a million downloads. So, I don't know if that's like a million unique users. How they is like is it an actual app you download on like a desktop?
That's not clear from this. So, they have an they have a agent SDK as well. these ads. And by the way, like the the Red Bull ad is speaking to me as I have a Red Bull right here. But okay, good
good job. Good job, Google Ads. Yeah, this Ant Group thing has there's product an agent SDK which gives you a OpenAI compatible URL to use. So you can integrate with it and get your billion
tokens if you want and use it. Might as well. Yeah. So there you go.
They're a Chinese company both by the way, so beware. Yeah. You don't know where your day is going, but probably a powerful tool.
If I coding for my MCPD, so we talked a lot about how we think that 2026 might be the year of small models. So this is interesting if you do believe that because it, you know, this is pretty crazy what a 1.5 billion parameter model can do these days. So it's called vibethinker- 1.5B. It's a 1.5 billion parameter
language model. Total training cost of just about a little less than $8,000 US and it can achieve reasoning performance comparable to larger language models like GBT OSS which again GBT OSS not really the best model to compare to. But for such a small model, it's crazy impressive.
And so it has, you know, some pretty favorable benchmarks that you can see, you know, as you look down here. It's not quite the same, you know, going to be hitting all of the the larger models, although it some of these are pretty it compares pretty favorably. So, it's interesting. All on device. Yeah, you can run it on a very small device.
So, it's just available on hugging face if you want to uh grab it and try it out. What are your thoughts on small models, everybody? You think we're going to see more small models be used or you think the the big the big models are where where everyone's going to be using all their tokens?
I think eventually they will run on devices like on the phone and or like a fridge, those kind of things. Eventually. It feels like this is like the we're we're in the the data center era of the of the early internet. It's like everything was big data centers, right? Or the early computing maybe before pre- internet early computing huge data
centers just to run basic things and then maybe eventually you can you don't need your computer just has enough power and the the form factor shrinks. So it would be really cool if that is the case and maybe it will be yeah things are beodled too. Yeah. want to be able to talk to my fridge, tell it to to order some food off of
Door Dash, hit the Door Dash MCP server, and uh have it delivered. Yeah, that's the world I want to live in. Yeah, that's the that's that's what I want my Alexa to do. But, you know,
uh continuing on. So, this one got two more things and then we'll probably just have a few minutes to chat, but we are coming up on the end of it. If you are listening, leave us a chat. You can,
this is a live show. Leave us a chat in the comments whether you're watching us on YouTube, X, wherever. This was a research paper and for those of you that had a chance to read it, it's called solving a millionstep LLM task with zero errors. So, it's essentially research on how
LLMs could do way more uh complex tasks with, you know, without any kind of error or at least lowering the errors. And so this kind of talks through the paper. You can it's linked down below.
Um you can find it, but we'll just go through the highlights. Says, you know, using AI models can be flaky and make mistakes. So sometimes imagine you need a thousand correct steps in a row with a 99.9%
success rate. Uh but a million steps, it's statistically impossible. But this paper does you know shows how you can potentially do it using a completely different approach. So they
basically break the problem in the tiniest possible pieces. They're calling it in the paper maximal agentic decomposition and then a team of really small cheap I guess simple cheap AI LLM vote on the answer for each tiny piece of the of the task. So they're basically doing like a a a voting process and
based on the how the the votes shake out, that's how they complete the task. So they tested on the Towers of Hanoi puzzle, a classic benchmark where AIs often fail spectacularly. So they the plot twist was the most expensive state-of-the-art models weren't even the best for the job. Cheaper models were more cost-effective because the tasks were so simple.
So again, very uh interesting idea of like, okay, well, you take this complex task, break it down, that makes sense, but have a the whole voting idea of having a bunch of LLMs vote on the the course of action before just allowing one LLM to decide and make a decision. That's kind of a cool novel approach.
Yeah, like you have consensus in the network or something. That's pretty cool. I like that.
Yeah. I mean, yeah, almost like a, you know, blockchain consensus in a way, right? Is a little bit different.
Yeah. like as a in a as a opposed to a routing agent. It's like you have actually have a true swarm or whatever.
What is that called? You know, like uh like a hive mind. Yeah, hive mind.
Maybe we should have a new API called hive. Um well, we'll start with a template. Yeah. Now, now I'm just thinking about my uh my fridge, my stove, my air fryer, and
my microwave coming to a consensus about what I'm going to eat tonight. Uh yeah. Any other thoughts on on this from anybody?
It's kind of interesting, but I don't know how it would actually play and you know, how practical it would be. Um the the puzzle sounded cool. I looked at how the puzzle's done and I have my son who has a he's one and he has that toy where you put the rings on the on the other uh like sticks and uh yeah I mean
it just seems like a leak code problem in my opinion like every if you're a software engineer you you study these leak code problems and then you get to the job and then like can you do the job or can you just do the leak code problem sort of thing. So, I'm kind of interested in hearing other how other people are going to use this uh
framework you could say. Yeah. I feel like it might be helpful for like uh like medicine and like uh like the the health field. Like I think feel like I saw something like a while back about
how like they were using models to come to like a consensus on like a diagnosis for people and having like a bunch of different models and also like real physicians. And so this kind of seems like a a good paradigm for something like that that you want like like a lot more consensus with. Yeah. Yeah. Yeah, it's like a diagnosis rather than just sending it to one model. It's sending it to 300 different
models or whatever or you know 20 different models on all these different pieces and they come together or give you like the statistical potential for each diagnosis or something through that. Yeah, I mean it is does make sense for like mission critical things. Do you want to just trust one model or would you like to run a bunch of simulations
through a whole bunch of models? I just think it it is it's an architecture problem like how do you implement it? How do you actually make sure it's it's doing the right thing? It's easy on a puzzle because you can know what's the
right answer, right? You you can kind of it's a little harder when the right answers are less clear. Yeah.
All right. We have one more thing that I just saw come across today. Maybe it actually came out yesterday, but I didn't see it till today. and we we'll uh not talk the politics of it
but maybe just the repercussions of it and is that the White House just launched something called the Genesis mission. So this is essentially you know a Manhattan project for AI. So the department of energy will build a national AI platform on top of US supercomputers and federal science data. Train scientific foundation models and run AI agents plus robotic labs to
automate experiments in biotech critical materials nuclear fision fusion space quantum and semiconductors. Dang, that's tight. So yeah, I mean it talks about like using AI and robotics, AI agents. There's a whole bunch of things kind of
mentioned in this, but it has a very like the timeline is very aggressive. So within 60 days, they need a list of all the challenges. In 90 days, they need a full inventory of federal compute network storage for Genesis. 120 days they need an initial model plus data assets plus a plan to ingest more data
sets, which could be using other agencies, academia, private sector. In 240 days, map all robotic labs and automated facilities across national labs. And then 270 days, demonstrate initial operating capability on at least one challenge, one of their uh challenges that they list in the first 60 days. So, it's less than nine months to basically spin up a functioning AI for science loop.
Dude, who made those milestones? Me. Like, what the heck? That's going to take them so long.
Yeah. Like seems aggressive. can't imagine they're gonna hit it, but I I you know, you got to set your sights high if you want to accomplish big things. Yeah. Right.
Yeah. Maybe it's just to make the market go up again because was so much fear about the eye bubble and now it's basically telling like the White House is doing it so the bubble is still intact. More money please. I I do think you know to that point
it's probably going to increase like government spending into private sector or you know honestly probably just more money to existing government contractors. That's probably where the money is going to flow. Hopefully not, but ideally that it would flow to private sectors that are trying to innovate in these, you know, across these different areas to if you're going to spend the money, hopefully it it
helps uh helps private sector companies that are building interesting things in the space. If anybody from the White House is watching, uh talk to Shane. Yeah. Hey, send me the send me the RF
RF, you know, P. Maybe we'll put a bid. You never know.
Hit us up. White House. Yeah. Yeah. Whoever's watching from the White House. Yeah. Yeah. You can find me Discord.
Yeah. Come in over Discord. Yeah. I mean, but it is I I have had
some conversations around people doing things in government in general. So, I do think that this is probably going to increase that. Right. So if you're you're a service provider already providing government,
you know, services, you likely might find more opportunities now because there's maybe some more budget that wasn't there before. Yeah, I think Canada might actually they announced something recently, something kind of similar to like I don't know the details, but basically like a nationwide AI building project.
Nice. Yeah, I think it's I think it's going to be across a lot of different countries trying to figure out what their strategy is. Sebastian, is that the actual RFC? That is it. That you know, that's cool.
No. 1337 though. That's a great Yeah. Yeah. I don't think it actually
is. Uh this is a long show, guys. Yeah, we've been going for two hours and 10 minutes.
Uh, is is this the one you wanted to do for 24 hours straight? Yeah, I'm not going to be here, but you're welcome. Know this. Yeah, you're welcome to stay on. I'll I'll hand over admin controls and and
let let you do it. Uh, but yeah, this is our 50th episode of AI Agents Hour. So, we we'll maybe do a for those of you if you have a favorite moment, we'll do a quick round round table. if you've
either watched or obviously you've all been on before. Uh but then we'll start to wrap up. So today we've had Tony from Run Loop come in and show a sweet demo of how you can use Maestra with Run Loop sandboxes or dev boxes. That was really
cool. More to come from that I'm sure. We had Wardon earlier in our phone a farmer segment. He's the founding farmer
of MRA and we uh we talked through what the 1.0 looks like the migration path, how to some of the big changes and things like that. We had Grayson from the master team come on and talk about an agent in every Slack, what the future of talking to agents might look like, show a demo of the workshop he's going to do next week where if you come to the
workshop, he'll teach you how to turn a master agent and get it live in your Slack. You can build it yourself. Also, if you want, you can also sign up for our I guess you can get access to the feature flag. So, if you just want to
use mastercloud and turn an agent and add it to your Slack immediately without writing the code yourself. So, multiple options there, but we had Grace. Also, you should watch the movie Smart House. Yeah, that's another thing we should do.
A smart house. We we talked about, you know, I looked it up. Lavar Burton is the director.
I'm curious as to what it was. It's got to be good. It's fire, dude. Watch it. And then we
talked about all the AI news. We talked about Opus. We talked about Gemini 3. We talked about a whole bunch of other
things. That's the show. So everyone round table favorite thing about the show and maybe one something you're excited to that's coming in the future that we can hopefully talk about on the show. So something you're working on at MRA, something you you want other people to know is coming. Let's uh if you're
stuck if you've somehow stuck around with us for this long, maybe you'll get some like insights into what's coming next. Who's first? So, I'll go first so everyone can can think. My favorite moment of the show was a was
an episode I was not even on and it was when we had Ward's horse and goats on the the show because that was kind of funny. Like if you tuned into a live stream and you saw there's a horse on here, you probably wouldn't know what you're tuning into because you expected to learn about AI agents and said you
you saw goats and horses. But I think that's that's one of the things I like about the show is we kind of keep it pretty casual. We try to make it fun. We try to teach you some stuff. We're learning a lot every day. We're
definitely, you know, it might seem like we're the experts, but this stuff's changing really fast and we do what we can to hopefully learn. And this is a good forcing function for us to to be more educated, but hopefully share some of that some of what we're learning with you all that are maybe watching. Uh, and then I'll answer this question while you
all battle of who's going to talk next. How many people on the Masha team? Uh, I don't know the exact numbers. We're
we're over 20 now, though, so we're we're getting bigger. All right, I can go next. Uh, I don't know if I have a favorite moment, but what I really like about the show is like just the fun things between like you like you, Shane, and Albby like where you just in the beginning you just beger about things like those. Um, and I think
because I always listen to the podcast or like the the stream and the only thing the only at one time that I actually wanted to watch is when Obby had my picture as his background. You talked about it. I was like, "What? What are they talking about?" So, I popped up YouTube and looked at the the
background. So, that was uh that was funny. So, Ward, here's here's the question, and this is going to circle back to something we've been I've said three or four times today as it's our 50th episode and we're trying to get 50 reviews. Have you listened to it on like
Apple Podcasts or what? No, Spotify. Spotify. Have you given us a review?
Can you I give you five stars or something or something that I don't even know. This is the problem. The founding farmer himself has not reviewed the show, dude. I know. Can you? I don't know. I'm about to kick Ward out of this call here.
I've only said it on the show a thousand times. No one listens to me. This thing's chaos. You got no control.
I didn't know you could do it on Spotify. Yeah, I think you can. I don't know how you do it, but you can give it a review.
I think uh you can give it five stars, but that's I don't think you can can do it. Did you give it five stars? Did you? Of course. Okay. No, I did one right.
All right. All right. Daniel or you, John, or or Nick? Who's next?
I can go. Um, let's see. I think my favorite moment was when it was like um it was like Daniel, myself, and Tyler. We were just on I don't know this was a
way back and we sort of had to fill in episodes. Yeah, we sort of had to fill in because I believe Shane and Abby were out of town. It was it was a it was a good experience you know it's kind of like like uh you know whenever like I used to work at a coffee shop and then it would just be like it's kind of felt like my
co-workers were working and then like you know the supervisors are left you know it's like oh we can say whatever we want here. Okay. So that that moment kind of shined even though they could watch the stream all but yeah. Um sorry
what was the second half? It was like I mean yeah what's something you're looking forward to in MRA that's coming up like either something you're working on something you're looking forward to something maybe you know don't give don't tell all our secrets but we can tell a little bit here. Yeah. Okay. I I think the the biggest
one for myself is I've been studying a lot about like uh error analysis and how to you know u see how your your your agents are being are going to in production how to have more you know faith in that. So we actually have uh planning an eval or kind of a error analysis workshop uh at the end of December. So I'm really excited for
that. Um, it'll be myself and Alex. And, uh, if anyone is interested in that, feel free to join. Yeah. Sign up for the Luma calendar. Oh, yeah.
All right, Daniel, you're up. Um, yeah. Favorite moment probably same thing that that YJ was talking about. Uh, just like me, him, and Tyler just on
the stream. I think we were just like uh like pairing on something like we were just fixing bugs on the stream. That was pretty fun. It was just kind of like I don't know. I mean that that's what we do like uh pretty regularly also. But it
was just kind of interesting environment to do it like while people are watching like it you feel like a little self-conscious like uh everybody's just judging you as you're uh as you are are making dumb mistakes. We should bring back those types of episodes. Yeah. At least at least bring back sections of it here and there. I think that I think that Those kind of things were fun. I
mean, it's just, you know, we all get busy and we just fall into the the weekly the weekly cadence, but we could definitely do some other stuff, too. Yeah. But we do like pairing sessions uh with our teams as well. So, maybe we can
just put those on. So, instead of just doing it with three, we just do it with the world, which I think we did in EU time zone as well, Abby, when you were here, we just did like bug bashing, fumble, stumble, I don't know what you want. Yeah. Dude, someone messaged me in Discord saying like, "Hey, I really
missed the fumble, stumbling, and bumbling." Oh, yeah. Thanks, dude.
All right. Uh, yeah. Did you say what you're looking forward to? Oh, yeah. What I'm looking forward to.
Um, I I guess we can talk about observational memory. Is that That's cats out of the bag. We can talk about new memory coming up.
New memory. Yeah, we got we got something new coming. Yeah, the the the Manhattan project for uh memory for AI memory.
Yeah. I mean, we we we got we think our memor is pretty great, but we we're cooking up something even better. Yeah.
Yeah. And as I get older, my memory gets worse. So, this is going to be helpful.
Yeah. Just need some funding from the White House. Yeah. Yeah. Yeah. If if you in the White House,
you're listening and you want some really good agent memory, we we're cooking something up. We we have something for you. All right. Who's next? Um I guess my favorite live stream moment was I think it was you Obby, me,
and Ward. We were in uh I think we were in um I think we were in Belgium in like some random hotel. We were in a dungeon. The dungeon. Yeah. In a dungeon for like the whole live stream. There's a bunch
of random stuff in the background where we like bug bashing talking about AI news and the whole time it's just a bunch of most random stuff behind us. That was a good time. We've had some shows in random locations like Obby, you've done it in an airport. Y we did we did it in the bar. That was
probably my favorite one of my favorite episodes where we did the whole we live streamed from a bar in San Francisco and a bunch of people we had an on-site a bunch of people from the from the team just popped in and said hello. Uh yeah, from from a hotel, you know, we we the show must go on even if we're not in a normal location.
What are you looking forward to? Looking forward to uh server adapter and that's uh getting unleashed. Indeed, very soon. Server adapters. All right, I'll bring us home. Uh
favorite episode was probably the one at the bar. That was fun because there were so many people there. But then there's this one that me and Ward did where we didn't record audio for the first 10 minutes, so we actually started the whole episode again. Um, and I'll always
remember that one. Yeah, we tried to say exactly the same things. Yeah, it just didn't work that well.
Yeah. I mean, I I joke that we don't know what the hell we're doing, you know, we're just figuring it out, but we're here. We keep doing it. I'm looking forward to V V1. So, that's it.
V1. It's It's coming. It is It is getting close.
And with that, I think we should wrap up. So, also I know this is no horse, but uh Tim wants to say hi. Yeah, bring bring on bring on the animals. I saw, you know, Obby, was it
you had a dog in the background? No, your brother's dog. Nice.
Um I don't have my dog's not nearby, but let's go ahead and wrap up. This thing's gone off the rails. It's gone off the rails long enough. It's been a great show, though. This is what you get for getting like
five plus of us on. Yeah. Yeah. This is episode. Got to do it. This This is the 50th episode. There's
many more to come. If you like this show, please go give us a review, like us on Yeah. Give us a like on YouTube, subscribe, do all the things. Follow me
and Obby. You can see it on the screen here on X. And we will see you next time. Next Monday, most likely, maybe another day, but probably on Monday around noon Pacific. We'll see you all
back here. This is AI Agents Hour. See y'all later. Peace. See you. Happy Thanksgiving.





