Replit's Agent 3, Mastra Templates, Security Issues, and AI News

September 16, 2025

Special EU edition stream! Agent 3, Mastra Templates, Security Issues, AI News

Guests in this episode

Episode Transcript

4:10

We're live. What's up? Welcome to AI Agents Hour. We are in Europe, the EU

4:17

edition today. I'm Obby. Have some special new co-hosts today. Uh I think

4:23

you all remembered Mr. Ward Peters. Hello. And we have another new face, Alex. Say

4:29

what's up to the to the viewers here. Hey, how's it going everybody? Now, by popular demand, last live stream, a lot of y'all were pro Alex.

4:40

They were just, you know, singing his praises. So, we had to bring him on the stream. Uh, so yeah, welcome. It's your

4:45

first live stream with us, right, Alex? First of all, I think that's greatly overstated, but I really appreciate it nevertheless. And yeah, my first time on the AI Agents Hour podcast.

4:58

Nice. Well, welcome. Uh, if y'all are watching for the first time and just checking in, put something in the chat.

5:05

We we monitor the chat. This is a live show. Um, we are all also available on, you know, Spotify, Apple Podcasts. If you want to hook us up, leave us a fivestar review. If you're not going to

5:16

leave a fivestar review, then I don't know, do something better with your life. Um, five stars only. Uh, we have a really packed show today. Um, but before we get into that, why are we or why am I

5:30

in Europe and where are we going next week? We're going to um AI Paris. Um I think it's what's exactly the name of the conference? Forgot about it. Is yeaheng engineer Paris. You got it

5:41

right. Uh so yeah, we'll be there two days or at least two days at the conference and then uh some days in Paris itself. I think we're doing a meetup on Monday. I think

5:55

I don't know one of those days. There'll be a Luma event soon. So if you all want to drink with us, u definitely come by.

6:02

Alex, you're coming to the Paris conference, too, right? Yeah, I'm stoked. I get the training on Monday. Nice. Nice. Uh I think for a lot of us, it'll be the first time we've all met in

6:12

person, especially on the European side. Um so, some of your favorite mashras will be there, some new faces, too. So, if anyone's going to that conference, just hit us up and you know, we'll definitely meet up with you. We'll have

6:26

books, of course, and we'll be wearing Masha shirts, so we'll be there. And yeah, that's that's that. Bring your spicy takes how we can improve the framework.

6:37

Yeah, spicy takes only. Um, also we got a chat message here which is great. Great product using it in production helped a lot in developing a WhatsApp healthcare agent. Looking forward to

6:49

future releases or teases. Um, thank you. That's awesome. Uh, let's get

6:57

started with the show then. Um, so we had some big happen last week. Uh, that is honestly like it could be company changing type things when other people start leveraging your technology in their applications. And so I think the big thing here is

7:16

last week agent 3 was released by Replet alongside a $250 million raise for their series C. And agent 3 has a lot of features. We're going to go through that actually very shortly. But there's one

7:30

interesting thing about agent 3 is that there's MRA in it. Um, and so we wanted to demo what agent 3 is to everybody and we might show where the MSRA bits are. So, uh, I'll give Alex the floor here and let's uh, let's play with this thing. It's it's it's right off the

7:48

press. So, yeah. Okay, let's do it. So, what do you reckon? Let me share my screen and then

7:55

we can experience it together because I've had a little play but I I'm still discovering some parts of Agent 3. So Agent 3 really interesting by the way is an evolution of Agent 2 by Replet which was only released about 3 months ago which I think gives us a sense of the pace things are moving at. It's essentially a coding agent. I think it's

8:22

found a bit of a footing among vibe coders. If you're not particularly technical, but you want to build a really quick and really nice web application, you can click on the little web button here and you can prompt agent free to create a website for you. It has some really cool features that maybe we can look at. We can also look at this

8:40

new shiny beta tab here called agents and automations which is a fairly unique feature I think because whereas with a web app you're putting something in front of your user agents and automations have triggers basically and they almost exclusively run in the background and what I really like about agent free so far is that they've thought through use cases so instead of giving you everything which as we know

9:02

sometimes agents can go off the rails if you don't give them any parameters agent free gives you some predefined triggers is for example you can run a workflow in response to a Slack message or a telegram message. You can also create timebased workflows which run at a particular interval or a particular time of day. Um Abby and Ward, what do you

9:20

think? Where should we start here? Um where do you want to start first or should we get it right into it? No, web app.

9:31

Let's do a web app. Yeah. Okay. Okay. Um well obviously we're coming to Paris.

9:39

How different is it with like um bold new and uh vzero? Just want to see the outcome of it. So, we're going to Paris next week. We barely remember the name of the

9:51

conference apparently. And we're not even sure what date our meetup is. But nevertheless, we can try and create a landing page so that people uh can learn a bit more about it. So, let's ask it to create a landing p. You know what? I might I've been trying to get out of the

10:04

habit of typing and I've been trying to use super whisper more. So, let me run this quickly and test it works. Yeah.

10:12

Create a landing page for master's upcoming meetup in Paris. It's going to be on the upcoming Monday and everybody is invited. We're hosting it adjacent to the AI engineer Paris conference and we want you to bring your spicy takes. Tell

10:29

us how we can improve. You can keep the good stuff to yourself. We give it some like directionction.

10:46

Say again, Abby. Oh, I just said Super Whisper is tight. Yeah, I'm also doing them I'm also doing them a bit of a disservice because I'm actually using Flow. I've been alternating between Flow and Super Whisper. Um, but Super Whisper would

10:59

always with my microphone set, I picked the wrong input, so I switched to Flow. Um, what would you reckon, Ward? Can we give it some input about how to make it look like? Is there anything creative you want to see on this page? We're

11:10

trying to put this through its paces. Remember, I have no idea. French style. We got to incorporate the French flag. Oh, yeah.

11:23

The Eiffel Tower. Oh, yeah. French style. Um, Eiffel Tower.

11:28

Uh, of course, now I have to spell that. Thanks, Ward. Um, and uh, maybe give it like a futuristic tech vibe as well. So maybe it can infuse the traditional. Um because one

11:43

thing it's really interesting because I think where some agents get praise in particular these type of vibe coding coding agents is some seem a bit better at creating beautiful websites whereas many seem good at producing functional websites. So let's see if this produces something um beautiful. So oh it's even giving us these I'm kind of exploring

12:02

this with you guys. There's a theme drop down which looks interesting. Maybe we'll give it sand garden. Um it's also

12:08

giving us an option to improve the prompt. Let me copy this as a backup option and then we can improve it and if it's good we'll go forward. If not we'll go back. This is a really nice touch in a lot of coding agents because by refining the prompt at this step you end

12:21

up making everything more efficient just by making it more explicit or presenting it. In this case with it's added some labels that might potentially help the agent produce something that looks even better. Let's give that a whirl. I

12:34

wonder how long this will take. One thing I've heard about Agent 3 is like I think it's designed for you not to like watch it cuz uh the execution is like at least for one dude on Twitter it was like 50 minutes but then he got like a bunch of cool built. Um, so we'll see how how quick it is because I know like Vzero is they're all pretty quick to get you to your first the first thing, you

13:01

know, but they usually only do like one page and the setup and the setup is probably create next app or something and then they just create a page. So that's probably the difference. Yeah, but agent 3 won't create a next.js

13:15

hub. That's for sure. whatever it is create next.js apps.

13:23

It says here it's made with React and Node.js. I really love this. There's two things that stand out to me as being

13:29

really cool about this and potentially a bit unique as well. First is that it's prompting us to do a planning phase. So now we can alter the plan if we want. I don't feel particularly compelled to do that right now, but when we get into

13:40

building a workflow or an agent, our building an agent with an agent basically in a little bit. Um, this might be more opportunistic because we might want to give more instructions about how to do things. Right now, this looks good. It's really cool that you can either build the entire app or start with the design. I think that in the

13:59

modern age of web development, we do take a UIdriven approach. The UI almost dictates what the back end should be. I think that's become a popular idea and so starting with the design could be quite efficient here. Um I think we

14:12

should just build the entire app and if I was a bit better prepared I might have uh and this is a nice shout out to Raplet as well. They have a mobile app where I think you can I think the functionality is on par. I'm not 100% sure but if so that's awesome. And you can also clearly scan this QR code and

14:28

get notified when it's finished 20 minutes 50 minutes later however long it takes. Um, so yeah, what do you reckon, Abby? Should we let this run for a few minutes and then come back to it? Yeah, let's uh let's start the other the

14:42

other agent and automation at the same time or we can see. Oh yeah, I love that. That's such a pro here because it's all running in the cloud. Um, we can run multiple uh agents

14:53

at the same time. And I like that it's preserving the state here. You can see that well actually I was playing with something earlier and it's waiting for my input. It's currently working on the Paris AI connect. It's even dubbed our

15:04

event with a name. That's kind of interesting. Um, but yeah, let's get into the agents and automation parts. I

15:10

think to keep things I don't know I played with this a little bit earlier and did time based because I felt kind of comfortable that it wouldn't need any kind of connecting up with Telegram or Slack. Um, we can do that and play it safe or we can we can try building a Slack message event trigger and learn it together.

15:28

Let's do time based because I'm don't want to get my Slack API keys and app ID and all that because when I set up I played with this too and I set up the Slack one and you have to have a Slack app to then to do it which is fine. It's just I don't want to expose any keys. And with the agent one it is going to

15:48

ask you for um like open AI. So yeah, luckily I've uh planned a little bit ahead for that. I just I just stopped sharing my screen just to like make sure I wasn't uh going to share a key at some point cuz what I've done actually is uh created a prompt already. Um so I live in London. Um I'm not passing through

16:14

Europe. I live here and I'm always looking for new and interesting things to do in the city and if I ask chat GPT what to do, it says visit Big Ben, go to Eiffel Tower. That's boring. Like I've

16:25

done that. I want to know about like new events and I don't want to miss out. I don't want it to be Friday night and learn that something amazing is happening on Saturday, but I've missed tickets because I didn't know. And so I

16:35

thought maybe we could build an agent to basically send a personalized email every Monday morning featuring activities for the coming week so that I can better plan. What I really like the idea of is adapting this for different cities. So when we're in Paris next week, we could just replace London with Paris and go from there.

16:53

Sounds dope. So let's fire this off as an agent and automation. Once again, it's following that same plan build workflow, which I think is going to be especially important here because, you know, us being technical or at least in my case somewhat technical, we can like gauge if this is going in the right direction and

17:12

tweak it a little bit. And what I'm really really stoked to show you and it'll all be revealed in just a minute is that this coding agent is building an agent and that agent is essentially a master agent and we can show you a bit more about that in a minute. Wonder what models that is running under agent 3. Is it like a mixed model approach? Are they using just sonnet?

17:38

Guess sonnet probably sonnet maybe planning. Maybe planning is to a different one. It's GPT5 and then coding might be Sonic. Yeah,

17:49

that's a great question. Um I wondered the same thing. I I don't know for sure, but I know that when they released um Agent 2, it was in partnership with Anthropics 3.7. So I think it's safe to

18:00

assume they're building on those um yeah, same models. Yeah, maybe they use like Opus for planning and then cheaper one for coding. Or maybe they're about to like change everything to use GPT5 codecs or Yeah. There's so many different models now for

18:17

this type of work. So, it's interesting. Yeah. And if you read Reddit or

18:23

something, they're all like people like CL, some people like uh um the codeex one. Like haven't seen people going on one nowadays. Yeah, at the mix.

18:35

Yeah, I'm happy with my opus. I'm not sure if I have the right the best sort of resource to pull up and show. Um, but one thing I do want to point out that's really cool is that obviously we have an app type. We we said that it's also identified an

18:54

integration called replet mail. And I think this is super powerful for two reasons. like one like naturally your agent is going to want to interact with the outside world in some way. Emailing is such a common use case. Repolive

19:07

anticipated it and provided an integration. So you don't have to go and set up like resend or send grid or some transactional email service with a separate key. It's pretty convenient for vibe coding. Um, and I also think we'll

19:20

see a bit more of this uh when we come back to the Paris meetup website, but Replet also has this ability to like deploy the website. And so having the integration native like this will make deployment super easy. Um, so yeah, here's our plan. Leave the platform then.

19:38

No, unlike lovable and well unlike lovable vicero technically, but they're all kind of like hidden away, right? in at least with like lovable they're they're deploying to someone else's infrastructure. Yeah. Rezero obviously Nex.js or Verscell I mean sorry but yeah Replet has already

19:58

had this architecture for so long because they weren't always an agent builder company. They've come a long way actually from where they started to now. Even their databases they have they provision databases for your projects. It's not like they have to use superbase, right?

20:14

They're just giving you PG. Here you go. Because back in the day, y'all, Replet used to be a like a like a sandbox execution to teach you how to code. And

20:26

then it evolved into you can build apps in it. And then now your agent can build those apps. So for you learning how to code, you needed like a rep a ripple to execute all that stuff in a sandbox environment. And so when the age of agents came here, their architecture just shifted so naturally. Unlike others

20:46

who are building new components to support this, they already had all the components. And like things like this Mashra integration, like we have all these dependencies for components. They were like, "No, we're good. We can just do that. We can just hook that up." And I'm like, "All right, cool. Thank you." I

21:02

think I used Replet in the past. Didn't know it was the same replet as before. Yeah. Okay. They've been doing that for a while,

21:08

like 10 years, let's say. Yeah. Because I used them to do like small snippets and then just run it to see what the execution was.

21:16

Okay. Remember that time in the JS ecosystem when we had like code sandbox? JS bin JS bin and then replet people would use different things for reproductions. They oh look at my replet for the repro.

21:29

Yeah. No one does that anymore. But and like code sandbox got acquired by together AI. So like the industry has changed since the

21:40

beginning. Everyone wants sandboxes. Yeah. Sorry that was a tangent. Alex, go ahead.

21:46

No, I love that context. I think Replet have been like so well positioned and they've capitalized on this amazingly. And I also agree like I knew Replet is like a readaluate print loop thing like from years and years ago and it was always in the zeitgeist. I would always see it featured on podcasts and you know and I and it's kind of but lately it's

22:05

hard. It's impossible to ignore their success and some of the things they've been doing, including this agent builder, which has come up with a plan, but says it will produce a weekly automated email web scraping. I don't feel great about this because it's like um a bit it seems a bit generic, like it's not very specific and I really want to make sure that it doesn't just call the L. It might be so tempted to ask the

22:30

LLM for suggestions, but then I'm going to go up Tower Bridge for the 10th time, and I don't really want to do that. So, let's make sure it actually tries to extract some content. Um, which to be honest is a bit of a challenge because the way I would build this if I was building a workflow in Mastra would be to use an MCP server from Exa or some

22:47

other like web searching uh service and lean on its power to do this. I don't know if the agent builder will do that or if it will try and roll something by its own hand. Let's find out.

22:59

I hope it does. It'll be like replet web search integration be installed. Okay. So, would you like to approve the initial plan? And interestingly, because this is like I mentioned, these are all

23:13

about back end really. These run in the background. Um, it doesn't even make sense to start with design. So, nice little thoughtful touch. I appreciate

23:19

that. And it's cool that that has a web scraper. So, that's tight.

23:26

Yeah. Yeah. Also nice show the same buttons but grayed out and say it's unusable because sometimes you just see one button and then you like where's the other button? Because I'm

23:37

used to the UX is consistent. Yeah. So I like that too. There's something kind of cool here in

23:43

the corner. Can you see that? Yep. Oops.

23:50

They need to work on their responsive While this ends, maybe Abby, you can tell us a bit more about like where Apple ends and Mastra begins like what's going on here behind the scenes. Do you know? Yeah. So, um, behind the scenes, it

24:09

first is like doing a MRA in it to create the MRA project. And you can actually see that it's already, uh, started creating the basic files. They're using our Postgress adapters for storage. So that's their link into MRA

24:26

and they have their own versions of all of our main components. They have their own logger, they have the storage. Now for workflow execution, they're using Inest which surprising, not surprising because we built it and stuff, but like it's like our first production use case of the ingest adapter. We knew that

24:46

people would use it, but now they're using it and we have to make it better. So that's great. Um but you can see that they've installed injest client and then they are essentially registering a bunch of different workflows and the main thing that the agent is creating is an agentic workflow with agents as steps or you know code execution as steps and that's really cool and then they uh they use ingest to

25:12

kind of actually run all these things. Um, and under the hood too, they their agent has access to Maestro context of the docs and examples and everything like that. Um, so do you know if they're using our MCP or they just using doing their No, they have a they've like and I think we have a file here. There's a they have a compiled like ah

25:39

you know how to write these types of agents and example configurations. So yeah, like very clever way of implementing this and similar to what we were doing as well. And I guess we're going to talk about that later in the show, but yeah, we did have some drama on release day uh for this, but uh we got it through.

26:08

I reckon we let this run for another 10 minutes and then we come back to it after the next segment. We can also take a peek at the Paris landing page to see how that got along. There's a couple more features I can show you then as well. Yeah, let's go check the Paris one and then we can uh can always come back to these things.

26:28

Oh, okay. We have something. It's still running, but we we can see a preview. Um French even

26:38

might need some money to pay a stair. We made it too French, dude. Is that our logo up there, too? Or is that just Oh, no. That's That's just our overlay.

26:54

Cool. One thing I've noticed about how this um So, one thing I've noticed for sure is that this seems to be an agent network and it has agents dedicated to different roles. For example, there's an architect agent that comes in a bit later in the generation process to test that everything is architected well

27:12

essentially and that the code is maintainable and things like that. I think that's cool. I've not delved into this with any great depth, but something that I think is new with agent 3 and possibly quite unique to agent 3 as well is that it's actually testing the website in a real browser. And we get a little preview of that here where it's

27:32

using some kind of agent to interact with the uh browser simulation in order to make sure everything works end to end. And presumably the reason it's a third like that is because it's doing some responsive testing as well. I think that is sick. That's the kind of testing you need to get input to relay to the agent for the next step to make it even

27:50

better. Yeah. and they have their own browser execution environment uh as a just a natural progression of their architecture. Uh they didn't they're not they're not using anything off the shelf

28:02

for that. So that's cool. They have a lot of tech that obviously you can see here they have a lot of tech built in.

28:10

One thing I've I was quite surprised, you know, because I haven't really used these vibe coding tools that much. Like I've played with Bolt a little bit of Lovable a little bit. Um but I'm a bit stubborn sometimes. I want to code it

28:21

myself or I'll use the a I'll use coding agents for specific tasks as opposed to and I have a few friends who are entrepreneurial and they told me wow this just costs too much and I was like but it's like 20 bucks 30 bucks a month or something um but actually the cost racks up very quickly and and part of that honestly is because sometime a lot

28:38

of the work is redundant like going into bugs if you if the agent produces a bug you have to pay for the agent to fix its own mistake and in this case I'm kind of drawn to the fact that it's all in French and if we have to wait for the agent to finish before changing anything. I just get the feeling that's going to use more tokens than necessary.

28:56

I wonder if we can course correct a little bit because some coding agents you have to wait for it to finish before you can give a new input. But in some you can actually say um make the text English and it will ah so it's ce it's allowed me to write but it's cued it. That's interesting. You can stop you can stop what it's doing I guess and then

29:15

I mean why not? It's just right. But isn't it the same as a regular dev team? If you go to a dev team and say build me a website with those with that prompt, they you will see a website in

29:27

two weeks and then say like it's just cheaper to it's cheaper this way I guess. So I think it depends on uh what company you are because if you're a big company that's how it goes. You just say the sprint of two weeks you get your result and then you say you did it wrong basically. True, true. I mean, agencies love to to have

29:52

revisions, right? Because you get paid. Yeah. Every two weeks or something, right? Yeah.

29:58

They at least they give you good visibility into what's going on, like the cost. But what's really interesting is that you pay the plan I'm on is 25 bucks a month. It's 15 bucks right now of a promotion. Um, but 25 bucks of

30:11

credit. Uh, you're paying for 25 bucks of credit per month. And I do think you eat it up pretty quickly and then you have to upgrade to get anywhere. So worth keeping in mind. But yeah, I'm

30:23

stoked to see the agent come to life. Then we'll come back to that bit later. Then cloud code.

30:29

Cloud code depends on your plan, right? But like the $200 plan, but let's say you just do tokens like if you pay credits. Oh, that's expensive. Yeah, it gets expensive. So isn't it then maybe on par? And I

30:41

guess replet is maybe a little bit more expensive that way. just because all the UI and all the the sauce stuff you get. So I guess that's why it's a bit more expensive. But I I still think like clot

30:56

code like just a subscription that's too cheap. It it will go up eventually. It's nuts. It's I've seen so many screenshots of people spending $200 for like $3,000 worth of token usage. Um,

31:11

but I think anthropic will always be cheaper just because they don't have to make a margin. Basically, you don't get the same UI. I'm guessing maybe one day, but I think guess replet will always be a little like a bit better. The same thing as if you use Netlify over Cell, it's more expensive than um than AWS because it's it just

31:33

sucks to use AWS raw. Yeah, it's a good point. Yeah, the economics of these like platforms are interesting because like lovable got everyone at the $20 a month plan and I remember like everyone's using it heavily then you run out of your base credits then you go to the $60 plan you use it heavily again um and that's because at the time everyone's like even

31:58

the bug fixes the changing the in to English like everyone's just doing that because it's the novelty and then now you're on these new plans where you pay 100 bucks 200 00 bucks, 500 bucks. Like there's just like so much room there and they're going to capitalize on people who don't know how to do you know? Yeah. So, if you do know how to do then

32:17

that's good because you can vibe code and fix the bugs. Like, I would never change like if I'm changing CSS to blue, I'm not asking the agent to do that. I'm not going to pay a dollar for that. Yeah, I could do it myself. But I know a lot of people who would

32:30

just continue prompting. Um, especially the people who are not really engineers or programmers. They just have an idea, but they don't really know what they're doing. They just keep prompting until it looks good. Yeah, that's where they're going to capitalize. And I'm guessing also um

32:47

maybe in the future like lovable or whatever, they will use some cheap models first and maybe they waste some money on that if it's doesn't give you the right um result and then they go more expensive. But I think from the start they just use super cheap ones. Like if you look at like SNET 3.5, it's

33:07

not a bad model, but if you can compare it to Opus 4.1 or even um the regular 4.0, it's it is worse, but it does its job pretty good. So it might be cheaper to iterate on a shittier model. Yeah. And then if you

33:25

know what you're doing like, but then again, why would you want to use 3.5 when you can use 40? But you don't get the solution. You don't get the option. So lovable does it for you. So you don't see it, but you

33:35

still think you're using the best model. I think it's the same things as like a cursor. Like if you use auto, sometimes it's super shitty. Sometimes it's good because they probably rotate

33:48

models maybe based on their credits. Maybe they have some like I still have some leftover Gemini credits. Let me use that instead of the set one or something. I mean, it's a smart way of doing it, right? Because like they don't have a

34:01

model of their own. Maybe eventually the same thing as if you lo use like electricity in the down hours it's cheaper and maybe it's the same thing when uh I could see that get cheaper also in the down hours or uh yeah like no vibe coding from 9 to 11 a.m. too expensive to do 9 to5. So you have

34:23

to pay a premium. So all the V just stay up at night to hype. Oh, we got a message from Paul here. I missed the start of the stream. Who beat Ward up?

34:35

All right, Ward, you got to explain your black eye to everybody. Yeah, because I didn't punch him. I swear. Yeah, I was um like putting a carriage

34:43

behind my bike to u get the kids to daycare basically. And it's um it's folded so you have to fold it up again. And I had to put force to like make it click and I was dumb enough to put my eye close to the bar and it was close to my eye. So that's why you should have said you got into like a knife fight with someone and this is like the only injury you you walked away

35:10

with. Yeah, you should see the other guy. You should see the other guy.

35:16

So while we wait for the generations, we're going to we'll bring in another monster here, uh Nick. Um because you know while Replet was building their agent 3, we got in a lot of people wanting to build agents with agents. So obviously at MSRA we want to explore a lot of these things. So we built our a very small initial version of an agent

35:43

builder. So we'll do a little demo of that. I'll uh let's bring Nick on. So why don't we do that and uh we'll mute

35:50

ourselves. Welcome to the show, Nick. I can't help but feel that you are in the same uh bank vault.

36:43

For anyone watching, I don't think you're going insane. I can't hear Nick either. Oh, but then again, you wouldn't know cuz you're in the same room. Well, I don't know what happened. He's uh you had to restart Chrome or

36:58

something. Yeah, to Okay. We couldn't hear Nick even though his lips were moving. So, but you wouldn't necessarily have known if you

37:03

were in the same room because you Yeah. Okay. Are you back? Yeah, I'm back now. Technical difficulties.

37:23

All right. Okay. All right. Cool. Now, can you hear me? Yeah. Welcome to the show.

37:36

Sorry about that. What up? That was bad. Okay, now I can

37:41

show my screen and show off the template builder. So, I'll just show it off right now in our Let's try to start this thing up. So, basically our our template builder is a way for users to bring in um just if they have a Masha project, they can go to the our our playground. Zoom. Zoom in. Got it.

38:07

Okay. The way to um if they have a master project, they can go to uh this templates page and decide, oh, I want to have some pre-existing agents, tools, and workflows. I don't want to like just to try playing around with it. Um so, they can look at any of these templates like a coding agent, Google sheet

38:26

analysis, join flashcards from PDFs. they have all these different uh GitHub repos that contain um different agents and tools that they want to they can merge in if they want. Um so we have the ability to um take a template and just merge it in using our template builder workflow. So for this example, I'm going to go with the chat with CSV.

38:51

Um, so like right now it gives a each template gives like a brief uh description of what it does and what kind of tools and workflows you're going to be bringing in. So we're going to bring in the CSV tool. We have a CSV agent. We have a CSV to questions workflow. And then we have a few

39:07

different providers we can use um for downloading this template. So just to start out, I'm going to pick for this example OpenAI. Uh I'm going to select the provider. The provider is um basically we use agents

39:25

inside the workflow. Um and that provider basically selects which model we want to use for these agents. And I'll show that code later once you get to the point. So for this example I'm

39:37

going to use GPD 4.1 and then let me and then we have some environment variables here as well. So we have like a model that we want to set in our EMV file um for our master project. So, we have like an open AI API key and a model variable. So, I'm going

39:53

to stop sharing just for just real quick and just uh add in something right there and then uh start the install process. But while that goes on, come back and share my screen again. Okay. So while this is going on, so right now we we have a um we have a few

40:15

different steps within our template builder workflow. We have a clone template analyze package. Um we're discovering units. So basically like um for cloning template, we clone straight

40:28

from the from the GitHub repo. We analyze the package.json of the template file to see different dependencies. Um

40:37

we uh for discover units we uh discover different template units. So like um like uh different tools and workflows from the folder. Um we order the units based on like different uh internal weights like which ones should take priority. Um, we prepare the branch

40:56

which is like uh doing some uh we make sure there's a GitHub branch just for this template so that all these changes don't affect the main um main branch or whatever current branch you're on and then we start merging packages. We install and then we copy files over um and then we have certain steps like

41:18

intelligent merge and validation and fix. they uh use agents to make sure that we're intent we're merging template files together and making sure that um we're properly um updating to match the uh format of the pro the existing project. So while it's in its validation step, I shall just show the code real quick for uh the

41:44

template builder. I have a question when you're ready, Nick. Yeah, I'm ready. I've used the uh so I know if you go to master.ai/templates,

42:01

you can clone and install templates from scratch. How is this different from that? It looks like it's doing something a bit more intelligent with those final steps there than just starting from scratch.

42:11

Yeah. So the um rather than this one's done rather than like you can just clone from a template but this creates a new folder new project and then you would want to if you want to bring it to an existing project. Yeah. Say you would already have existing agents and existing tools. Um you would

42:31

probably if you normally you would have to manually clone this and then you want to bring it over um to your existing master project and then it's a lot of manual copy paste um things you have to do. Um with our template builder you're able to have an existing project and then just press one click to install and it brings everything into your current

42:55

project. So without you having to manually um make any changes yourself. So you just install and it does it all for you. Oh, that's fantastic.

43:08

So I can go I can show you this off. Um but I'll show the code real quick just to go a little more in detail. So we have this is our current workflow. just

43:18

um we have a uh agent builder template workflow all the steps right here and it's all pretty um this all happens in uh in order and um uses internal tools so like I can show like the clone template step it basically um we have some internal utility functions that will um you know clone the re repository make sure all the files files are correct and then we have like editing um for these uh

43:54

for these uh these files as well. Um is there anything specific we should show off for the agent for the agent builder? Our main uh uh agent is this agent builder class. So it's uh we basically create an agent that has

44:13

like a bunch of internal tools um that we have in g tools for mode as well as like some memory and some processing. Um the main thing for the agent builder is these tools. So I'll show the tools right now. We have a bunch of tools for template builder. So we have like read file, write file, list directory,

44:33

um execute shell commands. Um we have a task manager to handle internal tasks such as knowing what files to merge into uh our master project as well as like validation validation errors and what things that need to be fixed. Um, we have a multi-edit tool that lets us um edit multiple lines at once and multiple

45:00

files at once. We have a replace lines tool that takes a uh like multiple lines in a file and then does an outright replace of uh those lines. Um we have like uh then we have validate code which basically does a validation on a specific file and lets us know what errors uh exist in that um for that file and then that gets those errors get

45:27

added to our task manager to um for the agent to uh to fix in the validation step. Can you share the instruction of the agent? Yeah, instruction of the agent. So each

45:42

um the main instruction for the agent is right here. Go to this party instructions right here. Oops. I'll find that real quick. Default instructions. So our main agent

45:54

builder instructions is we have this really this um pretty comprehensive instructions for it. Make that bigger. Okay. So it's like a you're a Masha agent expert agent uh specialized in

46:07

building production ready AI applications using the Masha framework. And then it excels at creating agents, tools, workflows, and complete applications with real working implementations. And then we basically give it a lot of different um capabilities. Um and give it its role to

46:25

transform natural language requirements into working mosher applications. Um it has a deep knowledge of mosha patterns and is able to produce production ready code. And um we basically outlined the workflow that has access to. So um it'll follow this sequence for every

46:44

coding task. It'll use the manage project tool to create a new project and then um go through each of these uh bullet points to make sure it uh properly does creates the project. So like it'll go it'll use information gathering. So it'll use tools to understand patterns

47:04

and APIs. It'll analyze the uh existing codebase for like naming standards and formatting. Um it has access to search for packages, examples and solutions and then uh can ask questions regarding like um regarding clarifications if any information is missing. How long did it take to uh write that prompt or instruction or how many

47:31

iterations? It was a lot of iterations. I think as we were as we were writing as we were going through and testing everything it took uh we just kept adding things we wanted it to it to do so that uh are there models you would say don't use it with agent builder like not then like not uh that do not work properly. Yeah, I'd say um you even with like we have a lot of

47:54

instructions and a lot of um tools that kind of help uh with this process, but there like a lot of weaker models like 40 mini not very good with this. It just takes a ends up going loops. Um like Gemini models not particularly great. Um, we've had a lot of good results with

48:16

4.1 and like five. Um, and that's basically a a lot of uh our testing has been with those. But um, yeah, when you have a good model, I think a lot of it is really

48:29

just even if it's all the tools and all the instructions you give it, having a really good model just helps speed this process along and make it a lot faster. Cool. Now let's do yourself.

48:46

All right. Let's uh so let's move you out of here. So when we started doing agent builders or we were looking at it, we wanted it first to serve the templates, right? We just had done the monster template hackathon. There's a whole bunch of

49:05

templates and you know a lot built by Alex and co. And the first thing we thought was like yeah templates are cool when you're starting from scratch but like what if you just wanted to add a PDF agent from a template into your thing because you need a PDF agent. And that's when we were like, okay, you could do this without LLMs like we did

49:27

in the past or we could start building this primitive in service of the templates to then expand to more things. Right? Oftent times we get asked, oh, I would love if I could just talk to MRA to build my workflows. It's like, yeah,

49:44

dude, that is a good good idea. Um, and so essentially we're going to keep working on this primitive. If people want to build agents using agents, then soon you'll be able to import the agent builder into your own application and maybe you can run your own lovable or whatever if that's what you want to do or like a agent 3 type of thing. Um,

50:07

because our next foray after templates is to use the agent builder um to then build workflows for you. that is our next milestone for the agent builder. So yeah, this is all very new to us. Um thankfully the patterns

50:23

already exist uh from all the other products. So we'll be uh just bringing that to open source. But it's also nice to for people who are building to just look at the prompts like how are we doing it? what's our

50:36

success and then iterate on it and build your own agent builder because I think that's what lacking a little bit like you see lovable and you see v zero and then you see prompt leaks but you're never sure that they're actually leaks or that it's just like LM came up with it. Um so it's good to see like a

50:53

working uh example in the wild. There was a whole era of Twitter culture in AI. And I say like an era like it happened like years ago. It was earlier this year where every day people were

51:05

just leaking prompts or like attempting to leak prompts because a lot of people wanted to figure out how are these products actually working and they wanted to build. Most people don't want to build this themselves. That's most users problems, but they want to know how it works, right? And so our

51:24

prompt is open source. You can look it up. It's not the final prompt. I'm sure we'll keep changing it over time, but it

51:31

definitely is not just a hello world prompt, you know, it was definitely had some experimentation in there. Um, Alex, question for you. Having essentially, you know, fathered a lot of these templates and brought them into the ecosystem, what do you think about where we're going with all this stuff?

51:49

Well, I think from I mean when I first joined Master a few months ago, by the way, there was some existing templates and while it's very productive to read about the primitives and the docs and build your understanding, um that's not normally the question people have, it's like how do I bring this stuff together? And so the templates in their own right

52:07

are a great learning resource. And now we're getting a bit meta where we're looking at the implementation of the the agent that Nick built. Um, which I think again uh serves in demonstrating how to bring some of these things together.

52:19

It's very productive. Yeah. Honestly, agent builder is probably a future template, right? So

52:25

then it's even more meta. Uh, which is great. While you were demoing there, Nick got a lot of comments we should go to. Um, Marvin said, "What's up?" What's

52:37

up, Marvin? Bonjour. Bonjour. And you know, Paul is just chatting with

52:43

Marvin in the chat. That's funny. They could just do it on Slack, but uh Paul's worried about, you know, it's not a work-related uh black eye. And I don't

52:55

know how to pronounce your name because it's in a different language that I can read, but building from templates is the right direction. Otherwise, it's too free. Front end should too. All right. I mean, I get

53:10

what you're saying, bro. So, thank you very much. Um, to come back to Paul's comment, it's basically what HR told me to say. Yeah. Right.

53:24

There, you know, we don't have HR department. Just kidding. Obvious HR department.

53:30

I'm the one who inflicted it. So, yeah. Anyway, um, cool. Should we check Okay. No, it's not like we're live or

53:36

anything. Should we check back on our agent 3 stuff? Yeah, let's do it. I was having a a little peek. Uh, and the So, let's look

53:49

at the the Paris Meetup website. If you recall, we stopped it, then we resumed it with some guidance to turn it into English. Um, it didn't do that. And I can't really fault the agent for doing

54:02

this. And I think it is tempting a lot of the time to get frustrated with coding agents when I give it really lazy prompts. I think agents are always trying to please us by doing something like productive. And in this case, it's

54:14

built an email form which requires a sang grid. And this isn't really what I was going for. So I might have to go back to the drawing board for this one. We were also looking at the London

54:26

Creator. We didn't call it lending creator, but the agent gave it a name, which I always appreciate because naming stuff is hard. This is another really nice feature of the agent builder is that well, you might have thought offscreen I pasted that API key so nobody can steal it, but actually you can configure secrets and ripple it and it will automatically use them when

54:46

needed. In this case, it knew my open API key already. It also is, you know, this is totally appropriate using an environment variable for the user email in this case. Um,

54:57

and I'm just gonna try my best to type my email without being able to see um, Alexra.ai. Maybe that UX could be improved. It's

55:10

not really a secret, I suppose. But, um, yeah, let's say continue here and see what it first point of call is. We'll say no, we don't want to save the email as a secret. So, to get us this far, it took about 8 minutes and it cost about two bucks. I'm pretty happy with that.

55:29

And I think now it's probably basically doing the finishing touches. I'm actually a bit confused, guys, because in my inbox, I actually got an email from the And I don't know how it did that cuz I hadn't told him my email. I only just did that. So that possibly could be to

55:48

do with the other um the ever um process I had running, but that one was waiting for input. So I'm not really sure what happened there. I'm a little bit confused. Um but you know,

56:00

the nice thing about understanding some TypeScript is that we can poke around and get a sense of like if it's going in the right direction. I always say when trying to understand a master workflow, a good place to start is the tools because the tools are kind of selfisolated and they give you the building blocks that the agent's going to use. And so we can have a peek here

56:19

and look at the London event scraper. It looks like it's using Cher.io or Cheerio, however you want to pronounce that, which um probably isn't what I would have done, but it's perfectly appropriate as well. I think uh Cheerio

56:32

is uh designed to scrape web pages. So, it's identified. These all look good to me. Um, Timeout is a popular magazine in

56:39

the UK. Barbcin is an event space. Design Museum is an event space. Um, I I

56:45

think this looks okay. Um, it looks promising at least. We've also got the email generator. And if you've been using Mastra before, this should look really familiar to you

56:56

because, well, it's using Mastra. We should be able to see that we create the tool with the same interface. Uh, a big part of tool calling is giving it a good description. I do appreciate that it's been detailed in the description. That'll make it easy for the agent to

57:09

pick the right tool for the right job. We have the input schema. I'm pretty pleased with this. Sometimes agents

57:14

produce a ton of bloat and I'm like, just dial it back a little bit. But this looks like probably the minimum necessary to build something really good. And then we've got this London. Oh, sorry. The content creator was the

57:26

last tool. I have no idea what this would do, but maybe the tool um Oh, this is interesting. It looks like it's using um AI SDK directly instead of calling into an agent. I think that's appropriate.

57:43

For the LLM call. Yeah, they're just doing like a object call there. Oh, yeah. Yeah, they could have used generate object.

57:49

Yeah, I kind of get it sometimes. Um, I think when I first started learning Master, having to create a separate agent file to to do a very focused prompt, um, when the agent didn't benefit from specific tools or memory config or something like that, I sometimes found myself wanting to do this, but I realized there are

58:06

some advantages of creating an agent as well. Like it's a good separation of concerns and I think it does help make things fit a bit better in the master world with tracing and things like that. So, um, it would be better, I think, if the, uh, model used an agent here instead of a prompt.

58:23

We have a thing in the chat from Jonty Brook. Apologies, I've just joined. Did you build this Monsterra agent in Replet? Yeah, Jonty, we're uh demoing

58:35

and trying out agent 3 which uses MRA under the hood and it it as you can see it did pretty good job. Here's that database you were talking about, Abby. It's kind of cool. You can

58:47

actually inspect the date. There doesn't seem to be anything here right now, but we will see that soon. You can inspect the database. Um, I also realized that when we were talking about like uh, you

58:58

know, we don't always know what model these agents are using under the hood. There is a high power option here for basically 5xes the cost, but uh, can improve the accuracy of the model. I think if I ran into an issue like a bug, I might conditionally turn this on to solve the bug and then go back to the uh to the other mode.

59:18

Well, I hoped that we could maybe demonstrate this right away. Maybe we'll come back to it later, but I'm also happy to move along. As it happens, this email uh is pretty good. I think it

59:28

might have come from a previous little test I was doing if I'm honest. Um, but it will give you an idea of what the final product is here. I'm very happy with it. I like that it's including the dates that shows me that it's relevant. These are actually like timely events,

59:41

right, that I might be interested in going to as opposed to the generic stuff that you would get if you only asked a model. I think that's a good start. And we've not really got to this yet. But

59:54

what's really cool about this is that it's essentially a cron job, right? Like it runs on a schedule 8 a.m. every

1:00:00

Monday. When you hit publish automation, I think replet just handle that deployment side of things for you. like they have a notion of a cron job in their infrastructure and so for creating like individual or personalized agents whichever nomenclature you prefer. This seems like a really good way to go. Yeah, you should click on that powered

1:00:18

by MRA thing real quick. Yeah, I believe you can go to the MRA playground for this. No, really? Really? I thought you'd have to like clone the code and then run it locally or

1:00:30

something. What? They host the monster playground as long as your your shit's running. So, and if

1:00:36

you publish the automation, the playground, I believe, is always running. Oh, so then you can interact with whatever you've built here as well. So, oh my, that needs to be more prominent. Um, I noticed they rebuilt. It's probably

1:00:48

using the same uh what do we what's it called? The library we use React Flow. Yeah, I get the vibe they might be using React Flow because it's got the same backdrop, but it's it's a much more like I mean, we've got Bendy lines, guys. So, everybody likes

1:01:07

Bandandy Lions. I think initially when we were talking to them, they wanted to incorporate the playground in that view that we came from and I just think there are some changes that we need to make on our side to make sure that's possible. So, more to come there. I know this was like the first release of their of their UI

1:01:26

there, but um yeah, we did I did a P I think four months ago with Lovable to do something similar and was just a lot of extra code they had to add to make playground or bits of playground work. Yeah. In it. So I think basically why they did it themselves. Um we just have to make

1:01:48

it more more modular I guess. Yeah. And we have plans to make sure that they take advantage of all the that we built. So then you get free updates. If we do an update in the component, they get it for

1:02:01

free. And then uh Jonty here to clarify, are you saying Replet agent 3 literally uses MRA under the hood? Indeed.

1:02:12

Indeed. This is pretty sick though because okay, let's talk about it from a business case and about making money, right? And no offense to Zapier, right? But why would I pay Zapier

1:02:24

if this continues to evolve, right? And they have more integrations and the same amount of integrations as Zapier. I could just use this to build automations with natural language and Zapier, they're all everyone's doing the same But like with Replet, I could do other things other than automations. Now I just can do that in addition

1:02:44

and custom code, right? like basically if then else structure which is fine for a lot of things but if you want to like uh change some like input outputs or something or do something yeah uh custom Shapier doesn't allow you to do it the only thing right now I feel I still feel that agent 3 is slow like for for a reason because it generates all

1:03:10

the code and stuff but if I just want a workflow right now and I have to wait half an to check if it's any good. Is it in eight minutes? We were just doing other right? Eight minutes, $2 to get a London email. That's not

1:03:25

bad. Yeah, true. And like what how much effort for we did? Nothing. Nothing. That's true. That's true.

1:03:30

Zapier, dude. We're coming for your milkshake, bro. Like, we're coming for you. That is true. That is true.

1:03:37

And any other automation like N8N. Why do you need N8? No offense to N8N, but like I could just use Replet.

1:03:43

Yeah, that's true. Now that's granted right this is a very simple workflow there's no branching there's nothing no I mean we could there could be there could be right and yeah just like possibilities there are super sick you know and you have code and you can modify it if you're yeah a little bit of a programmer or you hand this off to your deaf team like hey I did 90% of the work and you wrap it

1:04:07

finish it up or something maybe it's like the IKEA effect or something Um, but I I feel a sense of ownership over this right now. Like this is my utility and I'm kind of invested to person because I was just looking at the creator prompt. Um, and it's talking about like personality match. Well, I can tell it a bit more about my

1:04:26

personality, what I'm interested in. I think that's really cool. That's awesome. Do you want to do a test

1:04:32

a like if you go to the workflow view? I believe there's like a test run that we could run. Should we just do a little test run? Yeah. Yeah. I'm a bit nervous. I feel like it's in limbo.

1:04:45

I think that's possibly because it's working, right? Yeah, it's still working. What is it doing now? It's making moves. Burning credits.

1:04:55

Burning credits. Joshing our credit card. Another comment from Jonty here. 100% agree. Zapier make Nadn is dead to

1:05:08

me. Hell yeah, bro. Anything, too. That's sick. Yeah, this is insane. I honestly didn't know you could do this. Um, and it is

1:05:20

I hope I I hope and I'm excited to share these com like obviously we do a lot of work to ingest the stream and render these components. Um, this is getting even better, right, with things like nested uh streams where you could have a workflow running within a tool and you can have the human and the input part.

1:05:37

Um, if you're wanting to expose something like this to your users, you might have to build it from scratch right now. You might end up building something like this. It would be sick if we could share some of that work with uh customers so they can uh benefit from it as well. In this case, the part I'm really interested in is like being able to click on the output button when it

1:05:55

renders and get a little bit more detailed with like debugging and seeing what's going on, but doesn't seem to want to resolve right now. Um, I think it's fair. I think because it's kind of running like and it's doing its work and we're also trying to play with it. I don't think it's uh a totally fat test honestly. But yeah, what do you reckon

1:06:15

Abby? Should we move on? Yeah, let's move on to some news. Um so yeah, if you're just joining us, uh what

1:06:21

we've pretty much been doing, uh we looked at agent 3 both in the web app mode to build a web app for our meetup in Paris. I'll give it a zero out of five in terms of like what it did. Um, which is unfortunate. Um, and then we

1:06:38

tested the agent orchestration feature. I'll give it a three out of five. Yeah. Because it's very simple.

1:06:45

And then in my heart it's a five out of five. Of course. Of course. Yeah. And then uh we also have a agent builder

1:06:53

as well. Um, we kind of demoed that and how you can take our monster templates and merge them into your project via our agent builder as well as the agent builder code. We went through a little walkthrough. So, if anyone else is interested in building these types of

1:07:06

features themselves, copy our code, use it. It's open source. And now we're going to get into the AI news segment of our show. Abby, can you believe it? The second we

1:07:17

send off it finished. You want me to show it again? Uh, yeah. Let's give it like 20 seconds

1:07:23

to um it says perfect. I've successfully built and deployed your advanced London event agent. Here's what I've created for you. Um we'll just run it. So my inbox is

1:07:34

empty right now. Clearly not very important. And then we'll run the workflow here uh rather than opening it in master.

1:07:47

By the way, question for you both. Um, what do you do when your coding agent is like busy? Like like sometimes I just have to look something up or respond to a GitHub issue and it's like I it feels really fluid, but other times I'm just like twiddling my thumbs a bit. I feel like I'm just waiting 90 seconds, 2

1:08:04

minutes, 3 minutes for it to finish. Have you got anything to fill the time or have you got more skilled maybe at like filling those gaps with other work to be more productive? So anytime that I have to wait for something, I try to do something else. So like in cursor, I'm a big cursor user. So any of my background type tasks

1:08:24

I use background agents for and then I'm always being active on my main thing. And then also for me, the type of work that I personally do, I try to do smaller changes so I don't have to so I can get keep myself in the iteration loop. The minute I start getting distracted, I go on like YouTube and start watching Mr. or Twitter. So Yeah, I get super ADHD

1:08:49

that way. So, I try to make sure that I'm in the smallest iteration loop with the smallest leash as possible. What about you, Ward? I do the same. So, if I use like cloud code or something in my editor, I I ask

1:09:02

it like small things that are fast. And if I want to do like a bug fix and just want to let cloud code rip, then it's also a background task. So, create a new branch and run it off. Um but yeah,

1:09:15

mostly I do the same as Obby. I just make sure that the loop between cursor and me is small. Um or I let it do something and I go in another file to change something just to make sure that um um I'm not dingling my dumb thumbs. So basically that's the same thing. And if

1:09:36

I have to wait like 30 seconds, I can quickly still open Slack, check if someone messaged me or needs some help. So, um, looks like it's failing in this one step. It attempted it multiple times.

1:09:50

Yeah. Wonder. Is that the error or is that the email step or No, this is the uh That's a good question. I don't know actually. It just

1:10:02

it's applied to the workflow. Interesting. I don't know how to go more detailed than that right now. Um, yeah, that's a little bit disappointing, I think, because if I'm

1:10:13

looking at this and thinking I actually don't Oh. Oh. Oh, of course.

1:10:21

Of course. At the moment I speak, it changes. Um, rather than look at injust, let me switch back to Okay, so it finished with this time of an error.

1:10:32

This is uh yeah, I think it's in a bit of a sort right now because I saw it going back and forth about trying to call includes and undefined in the agent output and it's kind of just given up and said it's done because that issue still persists. Yeah, we'll move on now. I think we could be here all day. We'll just highlight this one last thing. We won't be running it.

1:10:53

But this is kind of part of coding. Sometimes you do run into like obscurities and it's quite nice that when you're presented and imagine you're not technical or particularly technical, there's this debug with agent button that will I think give I'm imagining it will basically give the output here as a prompt and do its thing. So there is a

1:11:12

path forward here potentially, but just not a path we'll get to finish today. Yeah, it's probably some some with that cheerio script that I made, huh? Yeah. Cool. Well, I think everyone

1:11:24

watching, you all should try out Agent 3. There's a discount right now. They're giving it away for hella cheap. 15

1:11:30

bucks, I believe, a month. Um, and I think you can also start free. So, if you want to play around with this, please do. I mean, we don't get anything out of it other than your usage of Monus. So, that's great. Um, but yeah,

1:11:43

let's move on to the news. So, we have a bunch of articles today to go through. Um, but I think we're going to start with some that something that impacts the JavaScript ecosystem. Once again, we

1:11:55

have another security issue. Um, and it doesn't necessarily impact us, but it is it is important to talk about. So, let's go through it.

1:12:08

So, Tiny Color, you know, you all have to be some like JavaScript uh old heads like us to know what Tiny Color really is because Tiny Color has a second version, Tiny Color 2, that is safe from this attack. But, uh, yeah, more do you want to talk about this? Yeah, just in general, I think that it's so hard to mitigate these chain attacks

1:12:32

because you install like a dependency and it has so many other dependencies. uh behind it that you don't even know about and you could say I will like uh put this version of this note module in my uh project and hopefully never gets updated but then if you ever update your lock file all those transitive dependencies might be updated because

1:12:54

you have no control over them. I think it's just painful again that there's no real way to mitigate it. Like there are like socket has its own registry that you can use um to prevent that but that's also um a little bit more harder to set up and those things. It's just a

1:13:13

little bit sad that it happened again and it's mostly like you can't um what you say like you can't blame the authors. They I don't know how this one actually happened, but the previous one with um with chalk and stuff is basically they got fished to um put their like um like change your password or something and basically that's how

1:13:36

people got into their registry and published packages. just said it's so like easy um to get into those accounts and then publish and then basically the whole ecosystem is um is messed up. So yeah, just a bit sad to see this again happening. Yeah, it's pretty much did the same exact method, right? They

1:14:00

the attacker like they compromise update package which I don't know if you're not even supposed to use anymore. Download the package tarball. You modify package JSON. You you change the post install script to run any script that

1:14:13

you want. Then you repack and republish it. And the only way you do this is if you have the uh the owner's key, right?

1:14:21

So then you could really screw everything up because anyone is probably not doing strict versions on tiny color. No, they're doing a open a carrot or something. Yeah. So any post install inci will

1:14:35

trigger the bundle. This bundle.js JS that was made and then whatever that does and I think what it does is actually creates like a GitHub workflow in your workspace depending because you're already in CI you already have the GitHub M Oh yeah and now you are access to everything right now you can make some moves and

1:14:53

destroy someone's life if you really wanted to and I believe that's what they were trying to do so this bundlejs script was run in post install and what does it do it uses the developer CI credentials create a GitHub action workflow and then you can get all the results at a certain web hook URL. So now you're stealing secrets, any type of secret that you have. Um and then you know GitHub actions have a secret

1:15:21

repository that you've you know it's supposed to be safe but like and then like if you put your M on any workflow step right you could elite the whole M. Yep. So it's like honestly it's kind of brilliant dude. I'll just be honest. is very brilliant the way that they did

1:15:34

this. And so yeah, this is like the the thing that they did. And then they stole a bunch of secrets, dude. And once you steal secrets, you can then

1:15:45

I don't I don't want to give anyone any ideas, but you could put hold of people at ransom. Yeah. Uh you could do a bunch of like you could abuse their OpenAI keys or npm keys or npm keys. You could delete stuff

1:15:59

which is not good. Um, but yeah, that's usually what these attackers are trying to do is to hold you at ransom and take money from you. So security is really important. Do not get I mean obviously

1:16:11

it's hard to do this sometimes, but two factor off all that stuff that we call most of the time because of security is actually somewhat very reasonable. So yeah, unfortunate though. Yeah, but it makes us think at Mastra as well like should we bundle more dependencies? So don't put them in package.json, but just make them part of the the bundle. Um, so you never have to

1:16:35

install them and then you're safe because the transitive dependencies don't happen again because they're not there. But it's a lot of work as well for us to keep things up to date. And not sure if it's really that useful, but maybe we should do it for the CLI because that's something people execute on. Yeah. And maybe core as well, like the two main packages. And then if for example like stores pine

1:17:00

cone gets um uh tainted it's only a small portion of people. Yeah but yeah more on that later. All right next topic here. This one did

1:17:16

create some controversy which is always good in in in this space but uh cloud code SDK supports custom tools and hooks directly in code. Um and you they essentially wrote a bunch of guides on tool calling and stuff like that. Where this got interesting is where Swix came in. I don't know where Swix's thing is. Let's find that. There's like a quote quiet retweet. I think

1:17:41

find his quote tweet. I have it somewhere. And Swix, you know, Swix is Swixen. It's all I can say about that. Um, let's share this one for

1:17:54

everybody. And then we can kind of talk about it. Pretty sure that Cloud Code SDK just destroyed every agent framework startup in existence. Holy look at this. That was very strong. That's a very

1:18:07

strong take. um which I don't agree with but uh you know it's similar to open AI right they have their own agent framework it it's good for if you use open AI but I think it's the same with like entropic it eventually has to come to this right yeah and then they'll I'm sure they're going to support multimodels too it's like the the writing is on the wall here

1:18:35

problem with the in the cloud code SDK there's actually been some people who um talk to us about they use the claude code SDK as a model for their mosa agent um and I'm sure like claude is that cloud code SDK model is already an agent itself right under the hood and so I'm sure there'll be more agentic features that are added to this SDK and it kind of comes to people's preference like

1:19:02

should I be using an a framework anymore should I just do everything through cloud code SD SDK I don't know it's not there yet but the question is always how much effort will they put in the SDK would they do observability would they do evolves exactly as you want to because they have have it but how much time will they have put into it because does it

1:19:28

mean if they put more time in this would they put less time in the models itself in like cloth code and I I don't believe that they would make it their star product. So I think they will always lag behind but it's I guess good for building something small or getting your um your feet into building agents those kind of things and maybe probably under

1:19:52

the hood they may be using it as well like if you um in cloud code could generate like an MCP server or something why not use their own tool that uh they can train cloud on. Yeah. What are your thoughts, uh, Alex?

1:20:09

Well, I had a question actually because I'm still learning about some of these tools and I understand that Claude Code is a coding agent essentially. Um, this is the Claude Code SDK. Is it like specific to building coding agents or is it more general purpose? What what is

1:20:28

this equivalent to? So for now it it is like I think the word code in the cloud code part is a little misleading. It's very similar to open AI agent agent SDK. This is the

1:20:43

cloud code SDK. It's just using the core cloud code technology to then be an agent for you. So it has internal tools in the SDK but then now you can bring your own tools. And so if you're building your own coding agent company, you could use cloud code SDK as the

1:21:00

foundation and then kind of go from there. Um, but you have to imagine everyone's going to be more generic. So it's going to be a general purpose agent and you'll be able to do more with it. And I believe Claude just add memory into Claude the product which means the

1:21:20

memory primitive will probably not too far away for from this, right? the model companies want to rule them all. They're like Sauron from Lord of the Rings or something like it's just the natural progression of it. And there are different definitely engineers engineering cultures out there that rock

1:21:38

with one provider in everything that they do. Like think about Shopify devs. They're very Shopify hardcore that even do the Shopify with the Shopify UI elements and like they're just into the ecosystem, right? And there are many

1:21:51

OpenAI developers that are in the OpenAI ecosystem regardless of other models out there. And that's why and those are the users who use their SDK. Look at the Cloudflare environment and how how I don't want to say that's very religious but it's very much a movement of using all Cloudflare products all the time. And this is another thing anthropic products all the time. where we sit is a

1:22:16

little interesting because we could easily get you know there is a world where we would get destroyed by every model company 100%. I'm not I'm not uh naive but there are people who want to control their own destiny when they're building their own products and I think us being like you know more like flexible across all these things and having different primitives that are for

1:22:37

application development and web development etc. Maybe that'll make us survive this these the onslaught of all these uh model companies, but uh time will tell for sure. I'll resist building on the Lord of the Rings analogy. I'm not sure if that makes sense.

1:23:00

So, we have a question or so Jonty had the same question. Um, and Paul, it's AI vendor lock in. Masher will work with most models and can can be deployed anywhere. Yeah, it's not fair though because

1:23:16

Paul's from our company. But, uh, I agree with that. Um, AI vendor lockin is a is is a fear from a lot of developers because they because everything's changing so fast. They're they're like who they follow

1:23:31

today changes and as models change. I've seen people go from anthrop anthropic fans to GPT5 or when Codeex came out people were saying I'm only using codeex now and it comes down to like I think the developer psychology here is a couple of two it's twofold. One, people want to know what to do and what they're doing is the right thing to do, right? And when they don't know what to do or

1:23:57

what the right thing to do, they look to people who are experts in the field or running a company that's successful or whatever and they just borrow those opinions and make them their own. It's very I've done it, everyone does it. That's the way to go when you don't know what the you're doing. Right? Then there's the second phase of this which

1:24:15

is the world is changing so quickly that you can't necessarily follow different messiahs here. You have to kind of figure it out yourself and you have to be a little bit more flexible. So those are the people that you see that are using every tool. They're changing every day. They're like, "Oh, I'm going to use this. I'm going to use that." And that's

1:24:33

also a very valid thing. And they're forming their own opinion. And the lock in there is like a little tough, right? because you like essentially have 12 different AI tools.

1:24:44

You're burning hell of money, you know, just using everything. But I don't know if there's a good answer for the the general public like what do I do? We get that question so much like what should I do? How do I do this?

1:24:56

Start building. Yeah. Choose the framework you like the most. If it's not masterra and you like cloudk better just

1:25:02

build something eventually you might grow out of it or it's like I'm missing these features but maybe master or or lang chain has it and then you move to definitely not lchain but like the other ones for sure. Yeah. Yeah. For sure.

1:25:15

So don't u think too much about which SDK to use. just start building and then whenever you hit production or have like a MC MVP or a prototype ready then you might think of like hey maybe I should switch to because I don't want to put this in production. Yeah, there have been many people who have switched from different frameworks to us and people who've switched from

1:25:38

not using from using MRA to just doing it themselves cuz the the thing that MRA helped them do was to understand how it works. So then they're like, "Okay, cool. Now that I understand how it should work, I want to do my own thing." And that's totally cool. That's called the graduation problem. If anyone runs

1:25:54

open source projects, you have this thing called the graduation problem where if you're writing a framework, you're essentially both doing education and writing a framework because that's where people are using you because they don't know what to do or don't want to do it themselves, right? But then people who do want to do things themselves

1:26:12

learn stuff from your framework and then they're like, "Cool, I graduated from college. Now I'm off in the real world and that's totally fine. You You're not an enemy of mine. That's for sure.

1:26:23

Anything else we should talk about on this or should we move on? Move on, I think. All right. Well, what was the This is interesting

1:26:30

because I think Claude Code SDK has been around for a few months at least three or four months. Um, this screenshot shows them adding the ability to define your own tools. So like what did you do before and why is this such a big unlock that people like Swix are getting excited like yeah so before the SDK didn't have tool call

1:26:52

or custom tool calling and those hooks are much like an AICK hooks like on finish on this on that so you can like plug into the execution of the SDK now they didn't have that before they just added that and that's is cool right because you want to start building stuff with the SDK Okay, you can as opposed to everybody else though. This already existed, right? So, it's not like

1:27:16

Right. Yeah. I mean, whatever, you know, but uh the built-in tools, they were things like reading a file or something to do with writing because it was all used by claude code internally, right? And I think people were looking at the power of claude code thinking, well, I

1:27:29

want a bit of this power, but there was no option to give it that custom. Now, you can. So, that's why people are getting excited. Yeah. And open code also has an SDK now too. Um to kind of on the same thing

1:27:43

everything should be both terminal access or programmatic access. So um it all makes sense you know. Yeah. And maybe if you're building a coding agent like say you're replet or leable

1:27:56

or something I might guess is my intuition tells me it might be a good this might be a good SDK to explore for that purpose right because it seems a bit more specialized. Yeah. especially well then but here's the here's the the wild part like the GPT5 codeex model exists now right so now you like why

1:28:15

even use a specific one when you have a better code quote unquote better everything's with a grain of salt but now I can bring a GPT5 codeex model into my master agent or any other agent and now I'm like a procoder built into the model and like I probably not even with the tools that we see here like dude like probably not going to use any of

1:28:35

these tools tools like why would a coding agent need to calculate compound interest? It's because it doesn't, right? It's because it's trying to become a generic agent. Y also this API signature sucks, but I'm

1:28:47

not really caring about that. Um yeah, but let's move on to the next topic. Okay, so we have two white papers to discuss today, but we'll start with the first one and then um we'll go on was gone from there. Let me share this one.

1:29:05

This is coding. It's uh Oh, we need to share this One second. So, I wish we had Professor Andy, but now you have us. So, this is from Salesforce AI research. Um,

1:29:24

which is really cool that Salesforce is trying to get in the game of this and very fitting right now because the whole industry right now is all about context engineering. It's the new buzzword. It's the new everything. And so this is a

1:29:41

nice white paper called Locobench. It's a benchmark for long long context LLMs. And they say in complex software engineering. I don't want to make fun of Salesforce, but I I glad they added

1:29:53

complex there because that's what Salesforce is. Um, so just a couple like kind of talking points that I prepared for this and we can just do jump offs. So I'll I'll say that so low code bench what they do is they are doing like context management across different coding languages. So they have like different context windows um with different variations. Some have like 10k

1:30:22

tokens to a million tokens, right? And then the the benchmarks actions like what they're doing in the bench is like has different things too. They they generated a bunch of files. They have 15 million lines of code. They try to make

1:30:36

like realistic code bases, right? And then as opposed to like most benchmarks according to them they don't test all the scenarios and you know low code tests like 8,000 scenarios versus less than a thousand or whatever. Right now I always take that with a grain of salt too because when you're writing a benchmark there's research behind it. You have to say that you're the in

1:30:59

ways that are politically nice I guess. So you al always have to be a better benchmark than others cuz why else publish it? Yeah. And you want to get clout too. So yeah.

1:31:11

And so like things that came out of this were really interesting to me. So every model shows performance degradation at scale. So as context windows increase, the performance decreases, which is interesting because the people like pumping context windows and no rag are like they're not really talking about performance at the million context model mark. Um because everyone

1:31:35

cares like everyone when Gemini came out people are like oh rag is dead which we also said that for fun but like when you're trying to do complex tasks or things that require that context window like the performance of the task goes down. If you're asking for the weather with a million window like who gives a right? Like um and so anything

1:31:55

that actually needs that context window of that size does not score well. So I thought that was really interesting. So that's like a performance note from this. But wasn't that already known to people?

1:32:07

I thought so. I thought it was known. I thought there was benchmarks that back it up. But I guess there's another one,

1:32:13

you know. Um I think it's called context rot, right? Context rot. Yeah.

1:32:18

But as these context windows are getting bigger, you don't really hear people complaining about context rock because maybe they got to the next level of because the window has gotten bigger like their their application can perform under the window but I don't know anyone actually pumping one million tokens on a request right like on the input. Yeah. So we'll see how that goes. Um

1:32:46

so that was performance that they did. The second thing that they saw is like coding languages are have a lot of bias. So Python's performance out of five is like 2.8 out of five which I guess is a you know it's fair right whatever. C 1.9

1:33:05

out of five. So system programming languages are harder because there's probably less training on systems. And then webdev languages score better because they have more training data which is something that I think everyone knew as well.

1:33:21

But then on the other side there all these AI maxis that are saying that oh you could just use AI agents and coding agents to rewrite a old cobalt co codebase from like the 1990s right? Is that true? Probably not. I don't know

1:33:39

if there how much is it trained on cobalt an older programming how many bugs would it have like it probably can read it and then kind of do things but then how many things um would it fail on or miss like maybe coal says like uh if I do this there these side effects that I don't know of and then they the

1:34:00

programming language won't or AI wouldn't even copy those with it so that's probably part of it so languages like Java score Well, y and there's so much Java code, enterprise Java code out there. So, if you're doing like a J2 type of type of work with the coding agent, it probably be pretty good. Ruby on Rails or Ruby in

1:34:20

general, probably because so many so many companies of our era when we started out like Twitter, etc., Airbnb, even Nellifi uses Ruby, right? So like a lot of people use Ruby because that was like the popular language which then builds up the training data for that.

1:34:37

JavaScript obviously but you know maybe also registries like uh Ruby has a registry for packages. I don't know if C has one like Java has Maven. Yeah. And so forth and maybe that's also part

1:34:49

of it. Um there's a lot of way to get data. Yeah. For your web is easiest. You don't have

1:34:56

to go to GitHub. you can just scrape a website and uh learn on that. Yeah. So there's biases of that which we all knew except now you know there's there's some data and there's always been

1:35:08

there's always been data on this It's just interesting to see that is reinforced, you know, and I guess if um Salesforce comes out with it, all the maybe non-technical people read it and it's like, "Oh, yeah. Okay." And all the other ones maybe are not read by that. Um like those kind of people.

1:35:27

Yeah. Um some hot takes on this. So there's this thing that they mentioned in the the post called long context illusions. So, and it's kind of what we

1:35:39

said like models that claim million token context windows are mostly marketing fluff which is so funny coming from Salesforce by the way marketing fluff. But anyway, um as context increases like the performance decreases I believe the the number they stated was like 29% to 3% success rate which is really bad. Even the best models GPT4

1:36:02

I think it was like two out of five on some realistic tasks. And then this is the real hot take from this long context is the new AGI. I love that. I love that because it just shows how the

1:36:15

industry like follows buzzwords. And so if long context is the new thing, it's overpromised and underdelivered. You know, those are their words. You know, those are like my take from those words.

1:36:28

Um, another thing that they also said was most AI benchmarks are toys. That's pretty bold statement, but they have more test cases. So, you know, um, and so they were comparing to like human eval and some other benchmarks that don't have that many problem sets for the agents or the models to fix. This one has like 8,000 realistic scenarios.

1:36:52

So, yeah. And I think the last point here and I think you should take everything with a grain of salt but this does seem directionally correct you know of like the analysis that we've seen from other white papers y but I think it really hits home that like context engineering is important but doesn't mean that it's a silver bullet. There is no silver bullet here you know.

1:37:19

Yeah. Any takes uh Alex? I was just flicking through the paper to see what the like results have been running their own test and it's interesting that they say that Gemini 2.5 Pro leads an overall performance and Claude Sonet 4 is the strongest in code

1:37:37

comprehension. It's really interesting to me that you still kind of sometimes have to pick the right model for the right task as well. That plays a factor and they also write the GPT5 excels and architectural understanding. Um, but

1:37:50

that makes me really interested to see what would happen. Like surely they'll run the same benchmark against the new uh codeex five, right? That'll be quite interesting to see. Yeah. Though I don't see benchmarks

1:38:02

getting updated after they're published, huh? You know, it's you do it for the cloud and you move on. Yeah. But yeah, so yeah, Salesforce hopefully they have more things coming out. They

1:38:13

definitely have the resources to do a lot of this AI research. I think also have to everyone has to think like to do a research paper takes a lot of work, right? You see how many like authors are on this thing. They also have to get

1:38:25

paid for like a job too, you know? So that's like quite a bit of money to publish something. Um maybe we'll publish something one day when we have the time and or money. But I think the big takeaway is that

1:38:37

basically what they say is that all the other benchmarks do like a um low amount of problems and this one is a lot of problems but they come to the same conclusion I guess. So it means that um this paper kind of validates all the others if you even push it to more examples but they spend more money right $8,000. We spent $8,000 on our like longme eval

1:39:04

like problem set for our benchmarking. So imagine how much 8,000 problems run across different programming languages. I don't know if they put the money in here. It probably didn't, but like

1:39:16

that's a lot of token usage. Maybe they invested in uh the seed rounds of Antropic or OpenAI and that's how they got all those credits. Oh, for sure. Oh, they use all their

1:39:27

anthropic credits just to for the benchmark. Yeah, that's how they got like billions of cinnamon or companies. Half of it is Salesforce. Yeah. All right, let's move on to the

1:39:40

next one. So, I'll just do the other white paper while we're here and then we can move on to non-white stuff. So, OpenAI released a white paper on why language models hallucinate. I thought this was super fascinating. I think it's

1:39:59

I think everyone should read it. It's it's not a hard read and like to read white papers. It's not an easy read most of the time because they're kind of verbose, but these are not the type of PhD papers from the past, right? Research equals production in a lot of ways now. So when you're reading these

1:40:19

things, they're not they're abstract thing thoughts and stuff, but they're all backed by things that are happening. Also, when I was in like when we were in university and stuff, like the the the system programming white papers and stuff were so boring. You could and then there would all be so many equations on like how to do things. This

1:40:37

is more so like these qualitative results and and have quantitative backing. It's way easier to read a AI white paper than a biology one. So you should take advantage of that if you're technical enough. You should start

1:40:50

reading white papers. You can use claude or openai to help you digest information. Um you don't even have to read it, right? But you can kind of know what's happening, but it's a good skill to like learn how to read white paper.

1:41:02

It's like a blo like advanced blog post basically. Yeah. Like a Yeah, it's a white paper is like an advanced blog post that's posted on a different site then cross-osted to Twitter and uh you get clout from it.

1:41:15

So, you should definitely try looking into it. I want to I want to agree with you, but if you scroll down to the bottom of this paper, it gets a bit gnarly. Like the last documents, they have all the formulas and Yeah. No, that's not too bad. Go down even further to like page

1:41:33

even further. To Yeah. To calculate hallucination even lower. These is the bib or the credits. How many

1:41:43

people half his credits. Yeah. Yeah. Yeah. This stuff gets a bit gnarly, I think.

1:41:50

Yeah. Did I study mathematics in college and stuff? And this we had to do this on a test. We like have a a theorem to

1:41:56

prove. So, this is pretty gnarly for sure. But what it's measuring is something understandable. It's like the hallucination gap, you know. But yeah,

1:42:08

this is this is why they get paid the big bucks at Open AI. Um, let's talk about this though. despite the the sigas here that you could see. Um also this is

1:42:19

from a this type of analysis is called real analysis and it's not it is mathematics but it is more of a concept of modeling reality and uh that's why these equations are so gnarly and there are a lot of symbols involved like they just represent real analysis. So that thing builders never have to do this So don't get scared anyone. But uh so the whole thing

1:42:46

is like why language models hallucinate. So first they kind of like talk about there's like two there's two stages and they have both have problems right? Um we have pre-training right uh and post-training. Those are the two stages of like model development. Let's say in pre-training

1:43:03

so models hallucinate during both. And so during pre-training uh models they hallucinate because of statistical necessity. So what that means is it can't generate it can't tell the difference between true and false. So it will just do both right and there's there's an interesting

1:43:27

reason why I'll get to in post trainings like post- training and um professor Andy has come on the show to talk about post-raining and reinforcement learning the the basis of both of those things is to reward there's this thing called a reward function that you give the model and as it answers correctly it gets the reward but the problem with some

1:43:49

sometimes the model will purposely answer or figure out how to game the reward function by giving answers that look structurally correct that invoke getting a reward much like a human trying to cheat, right? And so because like hallucinations happen um in in post training, it's because like the reward the reward function is not necessarily doesn't necessarily know if the answer

1:44:19

is right or wrong. So it'll give the reward, right? And so that's interesting problem. Um like how do you get get over

1:44:25

it? Technically it's not necessarily getting over it. It's like a fact of life that hallucinations are like mathematically possible. And so the main thing that they say is the reason why is

1:44:38

that LLMs are like have this thing called the student syndrome, right? They they hallucinate the same reason a student guesses on exams because if you guess if you guess let's say in an exam you are only do you only worry about the result right unless you're doing like mathematics and stuff where you have to write stuff if it's a a b c d and you

1:45:02

don't know the answer you could guess c on all of them and you have statistically 25% chance of being right so the system rewards you regardless right even if you don't know what the you're doing. Same thing with LLM. Like the system rewards confident answers, not right answers because a confident answer has a structure that

1:45:22

you can then reward, but it doesn't necessarily know if the sky is blue or not. You know, those are that's those are facts, right? And that's super interesting because like they they tested this like by asking for a certain person's birthday and they got three different wrong answers from different state-of-the-art models. And it's because the LLM is not trained to say I don't know.

1:45:45

Yeah. Imagine if LLM said I don't know though. The whole world would end probably instead of absolutely right. Right. Like

1:45:51

you're absolutely right. What if it was like I don't know, bro. Like but then that but that those three words or let whatever is like it destroys the ethos of an LLM. It destroys the whole point.

1:46:04

It should never say I don't know. But maybe it should though. Maybe you should say, "I'm not 100% sure, but I think it's this." Because that's what you do as a human, too. You basically say, "That's a good question. I actually

1:46:16

don't know, but if I like uh if I have to answer, I would say this." Yeah. But I think it's too much too difficult for an LLM to uh to do it that way. And

1:46:29

I think it's similar to if you have I guess fake news and those kind of things. It's sometimes also pretty compelling to believe what the news is about because it's it says something and then it states like uh this is the reason why this is true and I think the LM if it says a confident answer you would follow it as well. So so like this is here's a great point. So binary ev evaluations of language models

1:46:54

impose a false right wrong dichconomy and they award no credit to answers that express uncertainty omit dubious details or request clarification which is interesting because you would never penalize a student for seeking more knowledge right so blah blah blah under binary grading abstaining is strictly suboptimal I

1:47:15

don't know type responses are maximally penalized while an overconfident best guess is optimal That's so interesting, dude. So interesting. It's a bit normal, right? Because they are pre-training and post-training. So

1:47:28

they bake basically a human brain and then you ask a human brain. The same thing if you read a textbook and it's the only textbook you have read. It's basically all your knowledge comes from that one. Who are you to say that

1:47:42

something is untrue? Because you might have read it in that book, but maybe the book is not 100% true or maybe it it doesn't have the the latest information. So, I kind of get it from like a programming or like a modeling standpoint. Yeah, I can get it from like a human psychology standpoint cuz I mean we're

1:48:01

all trying to emulate intelligence, right? And the only like I don't know like have you ever had a teacher who is like a rough teacher so you can't say I don't know. They'd be like, "What do you mean you don't know?" And then you start

1:48:12

like uh is it the square root of pi with like a question mark at your voice and they're like is it or is it really you know like that type of stuff versus like a teacher who is like oh you don't know well okay here's let's look at the appendix like you should look at these things and things like that and so I

1:48:32

don't my LLM calls never hallucinate recently and that's because I've changed how I talk to my my models by giving it all the context that I know at least and when it starts going off the rails and stuff I then I I don't give it a reward. I destroy it with my words in a very mean way. So it always says I'm absolutely right of course right because I am but uh you know uh that's kind of the tricks I I've been

1:49:00

using just because subliminally right but now I feel like that is actually pretty good and we're not even training the model we're just using it. So yeah, I' I've been doing that a little bit. If I have to write some text or like an email or something, I mostly prep the email myself already and then I just ask it like, can you improve the

1:49:18

language or if you read this, how like what would you get from it and then it basically does a pretty good job of like making it better. But I hardly ever say, can you write me this? So I try to and then for coding it's more like hey can you do this and then it does half the job and then I prompt it to do things better. But if I really want to write

1:49:42

something, I do a lot of the work myself first or I let it do research and then I write something from that and then let it uh improve it. Where in the beginning of my journey of AI or LMS, it's was like one sentence prompt and it gave me something that okay, but you can't really work with it. So I think that's my evolution as well.

1:50:07

I'm scared about this actually. I was thinking about it. This makes me really scared about hallucinations.

1:50:13

We're using coding agents. So, our hallucinations are both acceptable and preventable, right? I'm not a doctor. I'm not using a a medical model.

1:50:26

And I have no way to evaluate it if I were to use one. And if a doctor was using one and they're in a different specialty, dude, I'm scared, right? What if it is hallucinating? I'm a cardiologist and

1:50:39

I'm somehow in an emergency situation. I don't know. Um, and I'm using my medical health care agent and it just tells me to do the wrong thing or because it can't say I don't know or it cannot route to the person who is actually supposed to know. That's really

1:50:57

scary. Yeah, it is. And we don't see it because we're using coding agents. Dude, we our shit's not

1:51:03

going to kill someone. Maybe. I don't know. But uh I don't think it's going to kill anyone except if you're using these

1:51:10

like maybe like electrical systems and all that type of stuff. You could easily build something that is completely Yeah. Yeah. That blows up after a while or something or at a certain condition and like Open AAI is making a life

1:51:22

sciences model. I wonder how the hallucinations will be and what evaluates are they're doing. Are they just taking like a biology book and like making that happen? That's scary, dude. also depends a little bit I guess where

1:51:35

you live in the world because I'm pretty sure like in EU or like in Belgium it's uh going to be very reg regulated that you can't use a model but I do believe that in other parts of the world they might do it and then you just end up like uh making it worse for people. Yeah. What are your thoughts, Alex?

1:51:57

Well, it's a very thoughtprovoking uh discussion and thank you for explaining this in such an accessible way. I really enjoyed that and I don't have anything to add. Nice. Well, that worked out. Um, anyone

1:52:10

watching, please read this if you want to get scared like I just got scared. Um, but if you're in the coding mentality, coding agent mentality, this is pretty interesting anyway. So, I think it's really good. Also, OpenAI

1:52:22

Research, they publish good Like, I know I talk about OpenAI a lot, but everything they do is just so well done, and I you can never hate on them for that. They are everything they do is just chef's kiss. So, even this. All right, let's move on to the next thing,

1:52:42

which is kind of funny. There's some story. There's some backstory on this one, too. So last week or so,

1:52:51

um, Anthropic released the MCP registry, which is awesome. Um, I feel like it's been a long time coming to for this to get out. So what is the MT MCP registry? So it's a open catalog um, and API for p

1:53:09

all the public MCP servers um to So here's a story. So story time. Back in like February when we were starting Maestra, we had just built our first MCP client for Maestra. And the first thing that we thought of was like how do you

1:53:31

know what MCP servers are out there at the time and still exists that there were MCP server companies? Um, and they still exist. So there was MCP run smithery um composio composio actually at the time composio didn't have MCP support in their product it was just API support but the minute MCP started taking off they converted their libraries into now an MCP server

1:53:58

and then I think the other original there was like other newsletter type companies that just maintain a list um like Pulse MCP was one and And then there was another MCP like thing for thing called open tools. Mhm. All people that we know, right? And so

1:54:17

we saw that there were so many different, you know, people trying to build on this like registry concept that we were actually talking to all of them originally to see if there's anything we could do for discoverability. That's the whole point of a registry, right? MPM is a registry. Yarn package registry.

1:54:37

freaking Maven is a package repository aka registry. It's for discoverability and for you to install easily in your project, right? And so at the time when we talked to all these MCP companies, they were like, "Annthropic is working on the registry spec, but we can't necessarily wait for them. So we're just going to build our

1:54:57

products and then we'll become registry compliant after." And so yeah, so that that's what this is. We also, just to plug our meme about this whole world, at the time we were just like, there's so many registries coming out. We actually put out like a funny yet real web page

1:55:20

called the MCP registry registry. And obviously it started off as a joke, right? But it's completely real like in a sense that these are all registries that exist, you know? Um, and many

1:55:34

companies wanted to get on this page, which is cool. Hey, if you want to get on this page, open a PR. You're more than welcome to. But like, um,

1:55:43

this is the registry registry. Now, we have to add the real registry to the registry registry even more meta. But, uh, yeah, so that exists. Let's go back to this blog post here.

1:55:56

I've got a question for you, mate. Um, so I suppose with something like npm, I don't often discover a package in npm, like I know what I'm looking for or someone tells me to install it in a read me or something. I suppose you could discover packages that are trending or related or something. Um, when I think

1:56:15

about MCP, there is that aspect of looking something up specifically. You go to one of these registries, you look up Figma, you look up whatever to know it exists. Um, but I can't help but feel that if we want our agents to be more autonomous, like is there a story here around them being able to like explore

1:56:32

everything available to them and conditionally connect to different servers or something like that? I'm kind of curious about the bigger picture and the vision for agents and how they could utilize an official registry in this way or or maybe it is truly like npm, a tool for developers to find what they're looking for.

1:56:50

Yeah. No, that's a really good point and that's where this all this stuff is headed, right? So, Smithery in in the first kind of exploration of this has a bunch of tools called I think it's like the smithery box or something where it could and when you have a registry. So, here's like the best part of a registry. You can expose search APIs to your

1:57:10

registry and if every item in your registry has a bunch of descriptions and you now have a search product that can be built into your MCM MCP registry. Um, much like MPM. So if you have search on your registry, now you've unlocked a tool call for an agent. Now once you've

1:57:29

unlocked a tool call for an agent, now you can have reasoning to do the next step after the tool call. So, Smithery allows you to do this as well where you can search the Smithery registry for let's say you're I'm looking for a way to send email. Well, it can then look for all the email tagged. Both

1:57:47

registries have registry items both have descriptions and they also have tags like if you're trying to do any type of categorization. So, you can look at all the email or productivity MCPS to narrow the list down. Then the second thing it can do is install the MCP into your smithery box. Um, and now the agent has access to both

1:58:10

the smi I call it the smithery box. I forget the name but like it's part of your smithery instance and now that at the moment I googled it and I was like I can't find this thing. It's a smithery instance, but I just call it a box, I guess. And then it can add more MCP servers to the instance. So

1:58:30

then now when it re like does uh rediscovery on its smithery instance, it knows it has the email tool now. And then boom, now it can then use it, which is what we call I don't know what we call it or we call it this, but I don't know what people do. It's called this term called skill acquisition. So an

1:58:49

agent can acquire skills by your prompt, not by you having to know that this MCP server existed, but you still have to add it to your list. Basically like pre-approval step like I approve these kind of MCPS. Yeah, you that's if you want to do it or you can go yolo mode and be like do all that, right? So not not today but maybe in the future,

1:59:12

right? As we get more confident. Yeah. But I think that's also the downside of like letting a registry find

1:59:19

all the things for you because they have probably curated it a little bit. But what if the the you use let's say the GitHub MCP but you don't use the official one and you take a shady one and you just let it rip because you don't really know what your agent is doing. Yep. It could like compromise your like your secrets or something. So that I think

1:59:41

that's the worry of just connecting a registry to your agent um because you don't really know what MCPS are doing under the hood. Yeah, I don't know why in my mind it goes more towards um like suppose you have an agent that can read your emails and then read spreadsheets and then look up your tax return from last year and maybe it has

2:00:04

to do like a little web search as well. like these kind of I can kind of imagine a general purpose agent that ability to acquire skills depending on because pre anticipating everything would be impossible I think and you end up building very specific agents for specific domains or verticals or use cases. Um but with a registry I wonder

2:00:24

like what the potential is for a general agent if it can do this in a way that's secure and reliable. I think I think if you have like an MCP registry that maybe rates like MCPS in them like they make sure that for example the GitHub one comes from GitHub then you're pretty sure it's okay and and if it's like if you use like

2:00:43

Figma come from Figma you're pretty good those kind of things. Yeah. Um and I think that's the next level.

2:00:50

Let's also if you look at like the MPM ecosystem like socket.dev dev is like a audit tool that goes through all the mpm um packages and scores them like maintainability what's the latest update what packages do they use like let's assume you have such thing for MCP2 then you could just just say like if they um hit this kind of validation then you can

2:01:14

just use it without asking yeah dude registries always incite capitalism because once a registry exists you need to be able to search it. You need to be able to maintain and moderate it, which is the plight of any registry is knowing if the modules within the registry are good or not. And like I told we talked about earlier in

2:01:35

the show, people want to know what the best thing to do is or they want to be told, right? So if I go to the registry and there's five different GitHubs and there's not any indication which one's the best, I'm going to have a problem, right? Most people will. Yeah. And I feel that this communitydriven

2:01:52

mechanism for moderation never works ever. So let's good luck to MCP there because we did this at Gatsby. We've done this in npm land. It never works. You have to have people who are

2:02:05

the stakeholders for the whole registry and they take it personally that there is not five GitHubs and their whole life is making sure the registry is up to date and bug-free. communitydriven stuff is like a agent sorry this is an economics term but like there's like a moral hazard problem the agency problem where if once you get one

2:02:26

package that is terrible and that just lets in other right the minute you have one y and you just up the whole thing. So now with sorry one last thing with this registry at this point registries are not new in the MCP space like here you can say you add your servers to the MCP registry which is just like in GitHub or

2:02:51

something and then like client can hit this API like this stuff exists in Smithery and all the other guys right now what they're trying to do is become like the default standard and then you can have like subregistries which a I guess smithery would be like a subregistry And one thing you can do in Smither is make your own private registries which

2:03:10

is then once again once you get a registry it incites capitalism. What do enterprises want? Private subregistries. Who's going to sell it to them? Anybody, right? Anybody who wants to. So

2:03:24

yeah, pretty cool though. Yeah, I think it is useful if you can host private registries because it probably has like the search APIs, the indexing and all those things. So, it's technically the same thing as like mpm for the Java ecosystem. Like why would you pay MPM to

2:03:42

do private and not just host your own Verdasio or whatever? It's just convenience. So, I think it's still useful to uh to have it and they have the name, right? like if you should choose between

2:03:55

like Smidy or um Open AI um um and you don't really know the space or you don't know Smidy, you would by default go for um Entropic because they made the spec. They probably know what they're doing. Yeah. So I think that's why they probably

2:04:14

going to make some money with that. Yeah, there's a definitely a bias to the creator, right? Yeah. But like in the same vein we have

2:04:22

let's see we have this smithery now mirrors to the official MCP registry. Nice. RCI will handle the setup for all deployments and ohicial registry.

2:04:38

So that's this this is the answer to if you already had a registry what are you going to do to coexist? Right. Yeah. And my guess is that smidy or all the other ones they hopefully will do

2:04:52

some cation eventually or have something like this is trusted this is not trusted and I if I read the blog post about MCP registry it's communitydriven so probably not be like curated well enough so yeah but what happens to businesses like this you know I mean we're smithery fans we're friends with everybody at Smithery

2:05:17

like what happens to your business? Like that's my question and I'm curious what maybe we should have ask them to come on and we could talk about this. The economics of registries uh step up their game I guess UI um ease of use maybe debugging deployment. Deployment. Yeah.

2:05:37

Yeah, they've got a playground there as well. Yep. They have a playground. You can do SSE or I think no, you can do the

2:05:44

streamable HTTP servers from them. They don't have private, but I'm sure they will have private soon because capitalism. Um, but I just wonder like what's the endgame then? Cuz like the official registry will probably do all the same too. If I were to be like

2:06:03

a a bear in this market, I'd be like, "Yeah." But I will say Anthropic took a long ass time. So if you do need a registry, I wouldn't use their official one. I'd still use Smithery, you know.

2:06:16

And maybe that's the case for a lot of people, too. Yeah, they probably will pivot pivot a little bit. Who knows? Yeah, unless Anthropic is going to get in the business of hosted registry. I

2:06:28

mean, for for money and whatever. Yeah. But if they add like a security auditing or whatever the smittery guys and that's a step up because if you look at I always go back to like the mpm ecosystem because that's what I'm used to. There are like paid registries as

2:06:47

well like for example socket.dev you can pay them and they they scan all your GitHub repos to see what's like malicious or something. Um that's something npm doesn't do and probably won't ever do. So those those extra benefits are good for enterprise.

2:07:05

They might not be for the John Doe's of the world, but yeah. Um they probably will make a bug or buck or two out of it. Yeah. Uh what do you think, Alex? Like

2:07:17

um I have to admit I hadn't really heard of Smy before um this call, but it's really piqu my interest uh to learn more about them. And when Enthropic released the MCP registry, maybe it's a bit of a noob um perspective, but I was kind of expecting it to have like a user interface and stuff like that. I thought they would be doing some, but it's very much a database it looks like. And that's all

2:07:40

it is right now. I don't know if that's the end game or if that's just the beginning of like a path they're venturing on now. So probably that's the beginning because they originally started with the registry spec now that they've implemented the spec and people are now contributing to the core part of the spec which is the data. Now you can create an API, you can create a search API which they

2:08:05

essentially probably already have. Um, and then you can create a UI and then you can link it directly into claude and then you profit, right? So that's the steps for anybody here. Um,

2:08:17

is there is there specs sort of I guess it took them so long because writing specs is really hard especially when you put out into the world you want it to be. So I think that's the reason why. Um, are they thinking about some unique problems here or are they the same thing were thinking about? I imagine they're both working on the same problem. They

2:08:36

just might have come to slightly different solutions. And and like Henry from Smithery is MCP contributor. Oh, cool. And that's worth noting for sure. Yeah, I always associate MCP with Anthropic,

2:08:49

but it's Anthropic Atal, right? There's quite a few people involved. Yeah. Yeah, there's a lot of cool there's a

2:08:54

lot of cool homies working on the MCP community and everything. But you're right, dude. They are they blazed ahead from where the spec was. Plus, like the spec was like not really moving that quickly and people were getting all like

2:09:07

annoyed and So, all the companies just went forward, but then you got you get in a tough place where now you have to be like co-publishing to the registry. You have to and you know Henry is such a cool dude in general, but like that was his plan all along. He's like, I'm gonna keep doing my and then when this thing lands, I'm going to make

2:09:27

sure I support it no matter like whatever it may be. And so good on them for doing that. I gota double check to see if everyone else is like co-publishing or not. Um, but they definitely were like first movers now in

2:09:40

this space, too. So, I don't know. I don't know what the future holds for these guys, but hopefully it's good stuff. Cool attitude. I just followed him on

2:09:50

Twitter and I posted Henry's handle in the chat if anybody else wants to follow them as well. Looks like a good follow. Very good follow. Um, cool. Let's go on to our last piece of news. Um, and we

2:10:04

kind of mentioned it already, but OpenAI Codeex with a new version of Jeep. What the is this ads? God damn. What the Welcome to the EA. Welcome to freaking Techrunch. Godamn.

2:10:22

Um, yeah. Open AI upgrades codeex with the new version of GPT5. If you're not living under a rock, you should already see this blowing up on Twitter. Um, you should also see people like Greg Brockman who's uh he was a CTO, but now

2:10:36

I believe he's like president or something of OpenAI. Um, I forget his new title, but um he's saying like this is game changer, you know, like essentially shots fired like Codeex GPT5 is the best. And so has anyone had a chance to play with it yet? No, I haven't. I'm stuck with my cloud

2:10:55

code. Um I we were in the original codeex beta. I used it then but then it was sucky and it was pretty trash but then it got a lot better since like 50 and stuff like that. But having a the codeex model which is like because

2:11:16

codeex is a coding model but then GPG5 is a mixed model which has superpowers right it can under the hood is like a agent network or a model network where you can then route to different models that are for the task. So now you have a reasoningable coding agent that also has mixed models. It's pretty sick. And I would guess a lot of people are going to build

2:11:41

models like this like this architecture. It's trend trending now. Yeah. And so like this makes me I'm not scared or anything but it makes me

2:11:52

worried for all the coding agent products. Yeah. Because they make such a good model. But then again, I think if they

2:11:59

expose the codeex model, I could still be the uh better UI and maybe change the system prompt a little bit or modify it so I still get better results. But then that's more for the maybe non-developers or the like people or the companies that market better. Um but as a raw developer, I would probably use codecs rather than like lovable.

2:12:24

Yeah. Or maybe people will start using codecs under the hood in all these products. You know, it seems like we're all kind of this is how technology goes, right? But we're all kind of aggregating on the same

2:12:37

point. We're all coming to the same conclusions, right? And those with money will survive essentially.

2:12:42

Yeah. The question is how many of these startups will be here in a year or two? Yeah. So typical OpenAI announcement, you have a you have a benchmark. Uh, so

2:12:54

GPT5 high, this is pretty good. And then refactoring tasks, you know, this can refactor things a lot better. Um, I did hear that codeex is a little bit overeager in refactoring, which is just a notable in uh just a notable thing that people noticed. Um, it just wants to do it. It's in their system prompt. You have to

2:13:20

always make the code better, you know. Um, be a junior dev. Think you know better.

2:13:29

U, they also use it. Oh yeah. Great thing about Codeex is they use it internally to do development at OpenAI.

2:13:35

So like they're their own kind of eat the dog food or whatever dog food or or drink their own champagne. Um, so that that's cool. And I remember seeing their their use of it too. It's very interesting. Um it's definitely

2:13:48

like a world that necessarily more people will start being in where like you just are using tools for everything. Um and codeex is one of them. Um but yeah any thoughts from everyone about where this will lead us? Is it going to disrupt? Any thoughts from the chat? Is this

2:14:06

going to disrupt? I think a big draw here is that for easy tasks, it's meant to be a lot quicker, but then for more complex tasks, it will think more deeply and so it should feel and it's a lot about feeling, isn't it? Like it will feel faster. It will feel breezier. You'll stay in the zone a bit longer. Um I don't know cuz this is the

2:14:28

same thing GPT5 um pioneered in a sense and made popular, isn't it? Which is like the rooting aspect and now it feels like that felt quite new. I don't actually love it in GPT5, by the way, because sometimes it goes into thinking mode and I'm like, I got the same answer to this question without thinking a lot faster and I it kind of annoys me. I have to stop it from thinking. Um, but

2:14:48

then in a codebase that might be exactly what you need, right? Yeah, I do think GPT5 gets into reasoning mode way too much or thinking, whatever. And that also costs money, but also I don't it's too slow because I like I could already think about it myself then, you know, like why do I give it to you if it's going to take so long, you know? But that's why it's probably good for codecs if you just use it for background

2:15:13

tasks or refactorings that you don't want to wait for. Yeah. Then it doesn't really matter how long it takes. And then Yeah. Yeah. For like Yeah. background coding tasks. I'm going to run an experiment

2:15:26

with the codeex on that. That'd be interesting because it has to then I'm gonna see if I can have it discover the codebase itself. Like you go come back report and then once you come back with the report I'll give you your first task then ask it hey what did you learn about the codebase and then do it another one

2:15:46

which will just be interesting because that's what too is I ask GPT5 to make the plan and then use opus or soulnet to do the coding. Um what was my thought here? Basically that uh I guess if it could do it all like maybe this is the the model for creating like big tasks or big projects or larger things because it can do planning

2:16:14

well. We also thought that plays assume I'm a power user. I would have to pay for GPT. Maybe pay only for open AI.

2:16:49

What do customers want? They want to be told what to do. They want to have a single invoice. Yeah. They want to pay one person and they get

2:16:55

everything they ever want. Right. Yes. That's usually what what what it goes down to. Uh we have a question from

2:17:02

Paul. What are your thoughts on model gateways? Well, that's a good question. What's your thought?

2:17:15

Sorry. New question. What is a model gateway?

2:17:21

Sorry, say it again. What is a model gateway? It's like a versel AI gateway where you basically do one request and they handle the routing. I guess

2:17:35

yes and no. Yes, but no as well. Like I think there's different classifications here. So one is like a model ga model

2:17:44

routing from a model itself, right? So like GPT5 has submodels, but you don't ever care about them because you're it's not a public API to you. So that's one type of model routing. Second type of model routing is you yourself use OpenAI, Anthropic, and

2:18:02

whatever the you want and you yourself are based on the user's input task. you're like, "Oh, I should use this model for this and you should use this model for that based on your project, what you're trying to do." Then there are companies that are called model routers. So, Versel has one, AI

2:18:21

gateway, open router has one, helicone has one, and that's just you pay one person and you get all of them and then you can do the same type of thing in your product. Um, I think more companies are going to have model gateways like this because it's an advantage to uh one one invoice. Yeah. Right. One invoice, one payment and you're good to go. It's for enterprises, right? Like they

2:18:50

like to pay one invoice and it makes them switch quicker to different models or play with it like AB testing even those kind of things. Yeah. And maybe if they have chosen for example Mastra as a framework or even a lower level like open eyes again they can put like a gateway in front of it then um they don't have to change their code. Yeah. They just have to change the model. I mean like the history of gateways though

2:19:18

like where they came from is because like for example there's companies that are like hosted companies let's say like open AAI and Anthropic that there at the time there were no open-source variants of this where you can host it yourself you have to use their API then there was like a freaking proliferation of open

2:19:39

source models that you have to run yourself but they do have open AI compatible endpoints so then if you're smart and you believe in capitalism, you're like, "Oh, I'm going to build a product that can support the open source models and the private models and any new model." And that's when all these model gateways came. Fireworks AI is one

2:19:59

as well, and they're a big one. They're like worth like 500 some million dollars. So, there's money in this space, too. Do you think Open AI and

2:20:07

Anthropic will model Gateway themselves to then to truly encapsulate the whole industry? I don't think so because that's what basically GPT5 is basically doing more or less. Why would they make a public gateway? Yeah.

2:20:26

Because it feels like they're moving away from that. Yeah. One thing that people take for granted though is it's not like Open AI is available everywhere in the world. Like it's definitely not available in the Middle East, right?

2:20:38

Really? Yeah. So like you can't use it's I mean probably I know in Saudi Arabia you can't use any of the foundation models just just the way none of the US ones none of the US ones you have to use open source okay then a gateway is your life then you have to use a gateway so but even use the American models yeah no you can't but well technically I

2:21:03

guess you could through a gateway use open AI um but I wonder if what the implications of that legally or you know it's like a VPN at that point. Yeah. Because like if you do like a lot of this is data privacy concerns, right? If you are using that then that data gets transmitted.

2:21:21

I don't know what shady they're doing out there. I'm just kidding. But like you know it's the same idea as as um as Deep Seek the Chinese MOS like the online version. It's like um like a

2:21:32

limiting stuff like it removes pieces and stuff and when you use the open source or you build it yourself then you have like don't have that restriction anymore. Yeah, maybe it's something similar if they use a gateway in China you don't get those restrictions but yeah legally I don't know if it's allowed. Yeah, just to bring this

2:21:53

Yeah. bigger than us. Go ahead. Just bring it back to Codeex. So Codeex

2:21:59

is like OpenAI's coding agent tool, right? It's like their version of claw code, I think. Um would it would you be able to use codeex and like cursor or do you have sorry the GPT5 improvements with codeex?

2:22:12

Could you benefit from those in another environment like cursor or would you have to use them with codeex basically? Um I believe you'll be able to use it in cursor because it's just another model. Okay. I don't know if cursor has support

2:22:25

for it yet, but I bet like within maybe it already exists right now, but yeah. Okay. Yeah, that's really interesting. I'm just when I heard model gateway, I

2:22:35

thought it was like a gateway drug or something and like it all started for me it all started with chat GPT and then look where I am now. I'm a fiend. Yeah, you're a fiend, dude. How many models you got? Jonesing for more. Yeah,

2:22:49

this is interesting space y'all. Like um as like think about codeex and lovable and all these products like and I'm not talking about the model codeex, I'm talking about the codeex product. All these are user interfaces for people who maybe are less more guey driven, right?

2:23:06

But here's another trend that we saw. Cloud code SDK, Open Code SDK, Open AI's GPI GPT5 codeex model. These allow you to do things programmatically. That's a good trend and I think that'll continue like a lot

2:23:21

of will become programmatic for sure. And then once things are programmatic, you have APIs and then APIs can build more products or integrate into your cursor or whatever you want. So I think it's it's all good Yep. All right. Should we wrap it up? Yeah. Before we wrap it up, Alex, how was your

2:23:38

first live stream? How'd it go? Did you like it, dude? I mean, if you look at the timer,

2:23:44

we've been here for two and a half hours, but it feels like about half an hour. So, I think I think that's your answer. And I just want to say I really admire the podcast in general. I think

2:23:54

it's awesome. So, I feel very privileged to come on and I really appreciate being here today getting to ask my uh you know, sometimes I ask newbie questions because I'm learning. But I think I represent a lot of people learning and I really appreciate both your patience and uh the ways in which you've explained. My camera just died while I'm being but

2:24:15

it got embarrassed for me. But yeah, thanks a lot guys. I had a great time. You were great. It was great. Great

2:24:20

show. We had a lot of We don't usually go deep into white papers, so this was one of those types of episodes, but that was awesome. And uh we hope to have you on more for sure. Uh for everybody else, thanks for

2:24:32

watching. This is AI agents hour. Just to do a final recapit both in the webdev mode and the agents mode and we our own builder that we were working on.

Replit's Agent 3, Mastra Templates, Security Issues, and AI News

Guests in this episode

Alex Booker

Ward Peeters

Watch on

Listen on

Episode Transcript