Back to all episodes

Builders Learn ML, AI News (Grok 4, Windsurf, AI browsers), Mastra Updates and the 8 second ad era

July 14, 2025

Today we do another edition of "builders learn ML" with professor Andy from Osmosis. We talk AI News (including the Windsurf/Grok 4 news), Mastra updates, and talk about the era of 8 second ads.

Guests in this episode

Professor Andy

Professor Andy

Osmosis

Episode Transcript

4:19

What up everybody? How's it going? How's it going, Obby? Going good, dude. How are you?

4:25

Good. Good. Let's kick it off. Uh, welcome to AI Agents Hour. As always,

4:31

I'm Shane here with I'm Abby. My buddy Obby. And yeah, we are going to be doing the same thing we do pretty much every week. We're going to talk about some master updates. We're going

4:42

to learn some ML because we don't really know a lot, but we're learning as you maybe are. Going to do some AI news. We're going to talk about CLI templates, whatever that means, and talk about the era of 8-second ads, not 8 minute ads, you know, 8-second ads. And as always, I have my copy. I see you have your copy

5:03

as well. Many copies. You can get your copy if you don't have a copy of our other co-founder Sam, his book, Principles of Building AI Agents, you can head on over to uh let me find the link here right there and you can get your own copy if you want. Uh we'll digitally available free or if you really want it just uh

5:28

send a chat message. This is all live. So, you know, maybe we'll we'll hand out some books if you are in a place we can ship to. We'll we'll send you actual physical copy. You gave out a lot of books this

5:40

weekend. So Oh, yeah. Tell me what were you doing this weekend because I was not there. I missed out on the

5:46

fun. I was at on Saturday. I went to a Mentra, which is they're in our batch from YC. They're like making smart glasses and the OS behind them. Went to

5:58

their hackathon. It was cool. Saw saw a bunch of people there. gave out a

6:04

bunch of books there. Did the hackathon was very nice. Um, yeah. So, it was like

6:10

there's just events going on every weekend now. I think we're sponsoring another hackathon coming up in August, too. So, just so many hackathons everywhere. Did they have pizza,

6:22

dude? I actually didn't stay for the food, so I don't know. But it was at YC, so it was like the typical YC buffet type thing, you know? Yeah. Yeah. Yeah. Uh, yeah. The the ongoing joke is that

6:34

you can get free pizza in San Francisco if you just look look up AI Meetup and you go they almost always have free pizza. So if you're you know if you're hungry and you're in San Francisco you can usually you have zero money and you need to eat go to AI meet up. Yeah. I mean Obby you did say at one point you were going to try to do a whole week eating free every every

6:54

dinner. I still need to do that. I just know it's too easy though now. Yeah. Yeah. I think we've learned it's it's

7:00

actually pretty easy to do because there's an AI AI meet up basically every night of the week. Yeah, every night. Dude, what else did you do? What did you do this weekend? Anything interesting? I spent some time I spent some time on

7:11

the lake. I was, you know, I got did a little boating. Did a little touching grass.

7:16

Did did some touching grass. Got away from the computer for a little bit. You know, not too far away. I mean, I was

7:22

only like, you know, always within Wi-Fi distance, but, you know, close close to touching grass as as I can without like trying to completely disconnect at least. So, yeah, it was good. It was a good weekend. Got to see some fam.

7:36

Um, so you you actually built something with the smart glasses though that Yeah, I built two things that are kind of whack, but you couldn't get any of the other features working yet. Um, my dream for the hackathon was to build an app that I can go to any museum and I don't need to buy those freaking audio things, right? I could just go and I can

8:04

click the button on the smart glasses. I can look at the painting and then I can get the story in my ear and I on the visuals I could see like what it's called and who who the painter was or whatever. or like you know if some in some of these museums like things are like in a different language so I want to look at it and then it translate it

8:22

for me so I don't have to like pull out my phone and like use Google to actually like translate it you know so that was my ideas and yeah I couldn't really achieve all of them latency is too high right now but I could see the vision of wearables and stuff it's so easy to for for us to hook into these JavaScript oss right So if like if your wearable has a JavaScript SDK, you can just bring your

8:49

agent there. It's like super easy. I just imported MRA and I used it. It was great.

8:54

So like what I settled for is if you said the word MRA and stars or something, my one of my agents will just print the star count of on my glasses so I can always tell people how many stars we have. Um, and the other one was if you're like talking about a Giblly movie, I have like a Giblly agent. And so, yeah, like if you're talking

9:17

about some Giblly movie, it'll just give me like a summary on the glasses like, "Hey, they're talking about this movie. This is what it's about." Kind of like Cluey for Gibli movies, you know?

9:28

You you you just made your own version of of Cluey. Yeah. Just for certain things, I guess.

9:34

Yeah. But uh yeah, the wearables it's just the latency is so high. So we're just not there yet in my opinion. But like we're like but people are building on it which is cool.

9:46

Yeah. I'm sure I'm sure it will get there. Um yeah, one second. I'm going to go ahead and I

9:53

need to send a link to our guest, which we do have a guest coming up. So maybe you can preview the guest that's coming on while I do my thing here. Who's coming? Andy. Oh, that's the next

10:06

Andy. Andy's coming, dude. This is my first uh Builder Learn ML section. I wasn't here for the last

10:12

one. Okay. Um, our guest Andy is coming. He is the CTO of a company called Osmosis.

10:20

They do a bunch of cool stuff in the RL space and pretty much a lot of other things. And Andy is like the was the youngest tech lead at at uh Tik Tok. We got to make sure everyone knows that. Yeah. Legend. He's a legend. Dude's a legend. And he's gonna teach us

10:41

some stuff. He's gonna teach us though. Yeah. While we wait for that, uh some

10:46

other things just a preview of what is to come or what is coming. We do have uh let me pull it up here. There's a shiny new page on the master website. I'm going to

11:08

show off here if I can actually find it. Why don't I just show my uh tab here? Where are you? There it is.

11:21

So the shiny new templates page if you go into the footer you can find we have some templates available on the monster site there more to come but you can uh get started and see that the code there's there links to the you can view it on GitHub so you can see all the code you can clone it you can run it and just

11:42

get started really quickly about you know a couple different you have six right now there's probably three or four more coming in the next week or so and we'll we'll continue to update these and ship more. But if you're just looking to get started quickly, now you can. Let's go.

11:59

One more thing we can do a kind of preview. Sam, you know, the one who wrote this book that we keep talking about, he did uh send out a kind of cryptic tweet last week. So maybe something to keep an eye on.

12:20

Can't share any news yet, but we hope to be able to share something pretty soon. Hopefully in the next uh maybe day or two. So couple other things. One one other thing. If you have not uh

12:38

come to a Maestro workshop, we do have a workshop on Thursday right there. So, if you go to the master.ai website, there's a green little kind of call to action to join our workshop. We always have whatever workshop's coming up next. And this week, it's Obby and me.

12:58

Normally, we get someone else from the team, but today it's uh you get both of us again. We're going to be talking about how to strategies we've been using to code with background agents and talk about do some brainstorming discussion. So, it's be pretty interactive. If you want to just kind of talk about things that are working for you and learn from others, it's a good thing to come to and

13:18

share some knowledge. I'm stoked for this, too. I'm stoked for a lot of things this week. We have a lot

13:24

of going on, but the background agents specifically, it would be cool to run a workshop from the setting of where you actually want to write background agents, like a bar or something. Um, because that would be cool. But if you come to this workshop, we're gonna teach you how you can code at a bar essentially.

13:43

Yeah. That that honestly that the background of this is that's how we came up with this idea because literally Obby was was spinning up agents as we were having a drink at a bar and he's just like distributing all these like prototypes he wants to build. And then eventually, you know, when he got back to his computer, he he took the prototypes,

14:04

learned from them. Basically, it was like quick throwaway prototypes. Like you didn't use any of that code, but you learned something from their implementation that then you kind of rewrote and and have been using, right?

14:15

So, yeah, dude. It was like finding gold. I was like, we got we got to tell people about this. So, yeah, come to the come

14:22

to the workshop. We're going to show you what we know, which is not much, honestly. Yeah, that's why I I like to say it's like a knowledge sharing because we'll share the knowledge that we have. We don't claim to be experts, but I think

14:35

that with the amount of people we already have, you know, I think almost hundred signed up, we'll probably get another 50 or more by Thursday. So, we'll have quite a few people there that have various knowledge that they can share as well. So, I think we'll learn a lot from each other. So, we're going to show off like cursor,

14:53

background agents, claude code, right? Codeex. I'll show off some codecs. Yeah,

15:00

maybe we should get Tyler's TUI thing going. Show that off. Yeah, it'll be good. It's gonna be a party, dude. Gonna be a party.

15:12

And now I think we can move into the guest of the hour. You know, we all we all like to, you know, Abby and I don't claim to be uh experts in machine learning or data science or any of that. We we like to think of ourselves as builders, right? We we are builders. We we like JavaScript and TypeScript and

15:31

building things and but we do know you know having a some knowledge of some machine learning concepts is actually good especially with all the stuff that we're building now. So with that let's bring on Andy. He's our buddy. Andy. And we're gonna be talking.

15:49

Hey Andy. Hey Professor. We like to we jokingly call Andy Professor Andy because every time we hang out we learn a lot and so we wanted to bring that to all of you watching this either live or you know on the the replay and yeah just learn more ML so Andy what were you hoping to teach us you know plebs that

16:12

don't know ML today? Yeah. Uh we can get a little bit into um training using reinforcement learning to train multi-tool agents. Um that seems like a pretty interesting topic. It sort

16:26

of builds off the uh initial um session that we had when I was first on the show and then just kind of go in depths about um how are companies like OpenAI, Enthropic, how are they building these models that are really really good at using these tools. I'm so stoked. I should take a shot every time I say I'm so stoked because uh I'm I'm really am. Andy, I'm so glad to have you here,

16:52

dude. Teach us, please. Make us happy to be here. And for those that uh did not see the

16:58

first one, we we talked all about different reinforcement learning concepts. And so I'd recommend you go I'll figure out what the date is once you start talking and I'll put it on the screen. I'll put the link to it so anyone who does want to to get to it to see the first, you know, the first iteration which was probably what, like a month ago, six weeks ago, something like that,

17:16

I think. So, yeah, it's been a hot minute. Yeah, it's been a while. Cool. But yeah, let's let's get into it.

17:23

Yeah, I can get started. So, kind of building off of um I can do a quick refresher on what reinforcement learning is and how does it work right now in the uh latest uh open state-of-the-art method uh which is going to be gpo and its many variations. So, gpo was the main algorithm to train models like deepseeek. um its innovation is that it realized instead of telling the model

17:47

what to do, it can instead reward the model for how well it did some task and then the model will learn how to do set thing, right? uh intuitively you can think of previously the predominant training method is let's say if I want to teach a model to do multiplication um I'll be giving it a like you know the 9 by9 like multiplication chart right I'm saying 1 * 1 is 1 1 * 2 is 2 9 * 9 is 81

18:15

so on and so forth right so the model will learn to um predict these tokens quite well but the second you go to things that are out of scope so things like 14* 14. If that's not in a training set, there's a very very high likelihood that the model wouldn't know what the correct answer is because it hasn't been

18:33

taught that. Right? You can you can see how in this way of teaching the model is teaching the model to memorize information but not necessarily how to get to an answer or get to a result, right? Um what reinforcement learning is specifically what GRPO did was uh they

18:51

realized hey instead of telling the model how to do multiplication let's take a pretty strong model that already memorized fundamentals of how multiplication works and then let's try to ask it questions right so let's ask it oh what's 12 x 12 what's 13 * 13 and then let's ask it multiple times right let's ask it 10 times let's ask it a hundred times and maybe out these a 100 times. Let's say 12* 12. The model just

19:17

randomly side, oh, it's probably 144. I'm not sure. It it probably is that case. And then if you ask it 13* 13, it

19:25

may be like maybe like 10 out of the 100 times it says that, oh, it's 169, right? And that's where uh the core of the algorithm happens where we see that okay, what led you to find the result of 169 or 144, right? What led you to find that correct answer? uh and then let's reinforce that. So whatever tokens you

19:47

predicted that led you to the correct answer, let's reinforce that entire thought chain, right? So that's kind of the that was the core predominant theory behind the idea of test time compute. Right? You're the model is continuously uh testing itself and then trying to sample to the correct answer and then we

20:04

are rewarding it for doing that. What would beam? Sorry to interrupt. What would be one of those like um like

20:11

for the reason why it's like 144, right? What would it what would an example be of like how it got there? Yeah, I'll make this up, right? But if you ask Claude right now, he could say something like, "Oh, I know 10* 10 is

20:24

100. I learned that before. I know that um 11* 11 is uh 121. Um I know that

20:32

before." And then maybe if I go on another one, if I go on like one letter above, maybe there's a way I can multiply it by like drawing it out and then like doing some mental math like you know like 1 2 1 2 2 * 1 is 4 2 * 1 is two and then you add like another 12 and 144. Um because it knows the math

20:51

fundamentals, right? It it knows like oh this is how you do multiplication, this is how you do addition. Um, so it knows these things, but it doesn't know how to use them well because quite frankly, we haven't taught them to do it, right? So

21:03

we are inherently betting on the randomness of the model or the creativity of the model to try out new things, the model will eventually try out something that just works, right? Sort of like when a human learned to do fire, did a human know that if you like, you know, spin wood a lot, it would make fire? No. But the human saw that, oh,

21:21

like I know that lightning strikes cause fire. I know that if things are dry, it cause fire. let me just try to mess around with it and see what happens. Right? And then once fire showed up, you're like, okay, that's what I do. Let

21:32

me remember that. So next time I need to make fire again, this is how I would do it. Right? Which um it's a very very

21:39

much an oversimplification of how a model learns, but the the core uh concept is sort of similar, right? Um yeah, does that answer your question? Yep. Cool. Awesome. So that's essentially how GRPO works. Um the full name is called

21:53

group relative. um policy optimization and the group is because we're asking them all to generate a 100 different answers, right? Relative is we're saying that oh 10 out of these 100 answers did really well. So the answers did well relative to the group, right? And then we're optimizing the model. So that's

22:11

why it's called group relative policy optimization. Um a lot of words there, but that's essentially what it means in a nutshell. So having that algorithm uh understanding uh let's go into how do we actually train agents using this architecture or using this algorithm right um you you think of it as uh two

22:31

ways so one way is we need to provide the agent with the tools that they will be using while it's generating the 100 different answers right so let's say um I don't know why people use this example all the time but the weather getting like the weather API or the the weather MCP server Right. The agent will then um

22:50

get a question of, oh, what's the uh weather in London? Um if the agent has no ability to use tools at all, it will just guess something like, I don't know, London's probably raining in England. I don't know. Um but maybe out of one of

23:03

the hundred, it says, "Oh, I don't know what this actually does, but this tool seems interesting. Let me just make a call and see what happens." Right? And if it made a call to that, it will get the answer back and then say, "Oh, okay.

23:16

this is how I would get the weather of a certain uh location. I would actually use this tool and this gets repeated hundreds of times, right? For different type of tools and these tools are all made synthetically within those labs. So

23:28

when when these labs are training the model, they will have um tools for search, tools for writing, tools for like visual tasks. Um the the challenge here is not really the algorithm, but how many environments and how many tools you can spin up for that model to use reliably, right? So it's combining a lot

23:46

of traditional programming um and environment engineering into this model architecture and model training itself and it becomes even more challenging uh on the technical side like without going too much in depth here um I can say that not all the data is trainable right we don't want to train on the result of the tool call right because we don't want

24:07

the model to always expect a weather call to London to return like 70 degrees right because every time it return something different. Um, and in order for for the model to learn that, the model needs to not actually know what will output it. It only needs to know what the model decided to do with that

24:24

information, right? Um, and that's how uh in reinforcement learning terms, that's how a policy gets created, which is a strategy where the model will always determine deterministically go, okay, I'll call this tool, then I'll call that tool. This tool looks like something that I used before in my training data, therefore I will call it

24:42

again. um and then use that data to do some further computation right um you can see where this gets really really tricky when you have to do rights so in in many cases when you have to do like oh I want to schedule um let's say you have an AI like personal assistant or AI account executive and then you're trying to schedule meetings with potential customers right um in the training

25:07

session you are scheduling the same meeting 100 times well then you need a hundred individual ual sandbox environments in order for the agents to not have uh results collide with each other, right? because then agents will think that oh any rights that I do I will always face a collision so I'm just not going to do them right so making sure the agent has the cleanest like

25:30

clean room environment is also another very very important thing that is uh is what's really challenging in the field of RL right but once uh once you can solve the environment if you can solve the tools train the agent is pretty much as simple as just getting a lot of tasks and a lot of questions Okay. Um, that's step one. And then step two is you can

25:53

actually take a look at how these agents are behaving in production, right? So let's say you have a MRA agent and it's currently in production. Um, doing like I said before like acting as an email account executive, right? Um, you can

26:06

take this trace and say that okay, uh, let's look at the implicit signals of what happened after uh, the this agent assigned or scheduled a meeting with a potential client. Um, was the agent able to find the correct time immediately? How many email back and forth did it take for the agent to do this? And did did the uh client potential client

26:26

respond in a positive or negative or neutral way? Right? These are all signals that can then be fed back into the training process to tell the agent that, hey, this strategy you tried here um pissed them off. Don't don't do it

26:39

again. Or this strategy you tried here didn't work pretty well. like it was a very simple simple email very simple structure keep doing that right um and that's kind of the current frontier of of what we're doing as well right uh the ability of taking implicit signals and then translating that into usable useful

26:58

tasks so um imagine in your master agent if you're depending on cloud 4 or 03 these models are static right these are models are static with um every single run you could give it new memory so it will have knowledge of what happened before, but a lot of the times the way that it reasons about things is is the

27:17

is the same, right? Even if you tell a model to not do something, when the model starts uh producing a lot of tokens, it will eventually forget about that instruction, right? Models forgetting a model's not following a list of things to do is one of the biggest weaknesses of like even frontier models today. And quite frankly, it's because they weren't trained on enough

27:40

examples of multi-turn agent tool use cases, right? So the attention that these models pay is not to the entire sequence, but just to the latest couple of messages. Um and and that's really that's probably like the next two or three model iterations are going to address that. Um yeah, and then this is

28:00

something that we are also doing as well. So if you like do training let's say you're doing this learning on the tool call for the weather like this simple example right like when you actually produce a new model that's trained to do this like how does that change like the the message structure like is that like the model itself calls the weather tool in

28:28

the specific way you want it in the in it like is it going to do any tool is it doing tool calls anymore like can you explain like how that works next yeah the model will make the tool call immediately rather than thinking about it so that's another benefit of using RL is that RL in its core I think this is

28:47

the most important sentence is that RL trades model entropy for performance right model entropy is the uncertainty and the creativity of the model where if you look at DeepS's uh thought process a lot of it is like the aha moments where Oh wait moment or let me try again moment right but that's actually not effective use of our time right we

29:08

understand that this improves model performance but unfortunately this also adds a lot of latency right so what our does is it makes the model certain about what it wants to do and and uh it will immediately make the tool call it will immediately give you the result it will immediately then do something follow up with that um so one of the hidden

29:27

benefits of using RL is that your your models become faster Um, of course, because the model is smaller, it's faster, but also it's thinking less. It's more certain about what it wants to do. It knows immediately how to how to do these things instead of figuring out uh looking at the same context as a blind eye every single time. Um, and this is

29:46

something that memory doesn't address, right? Memory will say that, oh, you should use these tools or you should do this, but it doesn't stop the model from still like if the model has a tendency to question itself, it will still do that, right? the model is like okay the prompt is the memory is telling me to do this but should I do this oh wait but actually you know it's very verbose

30:04

about these kind of things yeah isn't this like a wouldn't that be like a negative towards context engineering versus this or is there like is there kind of like a thing there I think it's not uh mutually exclusive I think having context is also still helpful um what we're reducing here is the bloat of test time compute right Um if you uh one example that we've seen

30:29

recently uh this is for customer deployments but the models that they were using uh usually their thought process is up to 67,000 tokens per per call right um the accuracy is about 80 to 85% you're thinking 6k tokens per call that's what we're working with right uh with RL we able to get that thinking process down to 300 to 500 tokens per call right so that is that is over a 20x reduction and then the

30:56

accuracy See, it was around only like a two to 3% improvement. So, it wasn't that crazy. But mind you, the original model that we're using with OpenAI's 03 model and the model that we trained was a Quinc 34B model, right? So, we used

31:09

the 14 billion parameter model to not not only beat out 03, but also think less and be more certain about that task, right? So, that's really the benefit. Like even the context, they still have a ton of context, right? just

31:21

the system prompt plus the context plus the memory is already at 16,000 tokens and we are still using that information to make it more efficient right but with these tool call beh sorry go ahead no go ahead Obby you keep on your train of thought I I got some questions though when you're done so like the tool call part right like where it now knows to execute the tool call the reasoning behind that is not

31:46

costing me any more tokens because it's built into my train like my reinforced model. Mhm. And and the weather agent is a stupid example actually, right? Because

31:57

type of stuff you want your model to have second nature in is some complex workflows or some other types of not some weather tool. So like not like a layman wouldn't be doing this. It'd be someone who actually is doing some intense stuff, right? Yeah. It's this technique is certainly

32:14

not uh it's overkill for the average developer, right? uh if you're just starting out on agent, if you're just starting developing, we wouldn't recommend tuning the model weights at all because you'll be changing the model architecture and prompts so much where if you train a model specifically that's compatible to that architecture, well,

32:32

it's not going to be useful once you change it even slightly, right? Um where it does become usable is when your when your agent becomes very mature and you don't anticipate changing the the architecture that much like everything has been working okay and when you realize that okay to get to the next step I need to improve modelbased

32:50

performance that's when RL would actually make sense. Yeah. Yeah. Totally. Okay. Now now I I need to ask the the

32:56

the me not being an expert maybe dumb questions that you know maybe if you're watching this you might be thinking some of these same things. Okay. So, you kind of answered some of it when you said that you it's not something you you you don't pull this off the shelf right away, right? Typically, you you get it working and then you eventually realize you're

33:16

running it up into some kind of limitation and then now you maybe say, "Okay, we need to use RL." And so, maybe and there's also maybe performance and cost benefits to doing so once you have that model that's trained on a specific task. one kind of really basic question. What's the difference between, you know,

33:36

RL and and fine-tuning? Are they the same things? What's the difference? When should I, if I'm a builder thinking

33:42

about these things? I I hit the limit of what I think the current models can do. Maybe I want to save money. Maybe I want

33:47

to improve performance, maybe improve accuracy a little bit. What what are the differences between those terms? Yeah. So, supervised fine-tuning is the technique that teaches the model um how to respond, but not why, right? So you

34:01

provide it input labels and output text and the model will learn to exactly predict whatever you tell it to which is the example I told you about the multiplication uh question right if you give it a multiplication table like you know up to nine it will do that very well but if it's over nine it doesn't know what to do right so that's where RL

34:19

is helpful but um again that's a very stupid example but um the reason why I say that is because if your use case is extremely extremely narrow um So much so that you already know the entire scope of the user questions then supervised fine tuning could be a very good option for you because SFT is a lot faster uh because it's only it's not using test

34:42

time compute at all right it's just telling the model hey produce this if you see this don't ask me why um and that's okay sure um but this requires a lot of data right tuning may require up to 10 20 plus thousand examples um for it to be viable and the second that you see something that's out of distribution your model will fail. So again this is only for your judgment. If

35:08

you think your use case is extremely narrow or no edge cases can happen and your training data captured pretty much the entire spectrum of how your model will be used then yeah for all means go with supervised hand tuning right uh with RL uh is really for two situations. Uh, one situation is you don't have that

35:26

much data, right? With reinforcement learning, because we're teaching a model how to think and how to solve problems, you don't need that many samples. Uh, you can get started as little as 50 or or or 100 samples. You don't actually

35:37

need that much. And 50 samples is actually quite doable by hand, even if if you really wanted to, right? Um, the other reason is if you need your model to generalize better to a vertical, right? So I know maybe my my customer

35:51

support um agent knows like 80 or 50% of all the potential use cases and it knows the SOP um but it's not reasoning about it correctly, right? That's when AR will be helpful. AR will teach the agent um how to be fair to everybody, right? Like everyone gets the same treatment um just

36:08

because the model decided for some reason to be creative today, it doesn't mean that it's going to treat one customer different than the other. And that's what RL is about, right? like making sure that the model is fair and deterministic. Um, LLM inh inherently is

36:22

a very unstable and blackboxy system. RL aims to make it somewhat more predictable and at least somewhat more explainable, right? How come more people don't want to just do RL? Like

36:34

if they they could they make 50 examples, right? Like it shouldn't be that hard. Like if you're a verticalized agent, why aren't you doing RL?

36:42

RL is a very difficult technique um to be fair. So to do RL you even on small models you very quickly need to do multi-GPU training right um accessing of the access to the hardware is one limitation and the second limitation is the construction of a reward function right the reward function is something that only exists in RL because that is the rule where we tell the model how

37:05

well you did right for math it's very simple did you get the correct answer if you did good job if you didn't you did bad right but for things that are less verifiable like writing an email or or scheduling an appointment. How do you know that you know the model did what it's supposed to do? There are very

37:22

simple things like oh was the Google calendar invite created? Yes, good job. No, you did bad. But then there's the

37:29

potential for reward hacking where the model realized oh I don't actually need to do anything. Let me just create a random Google calendar link and then I'll get rewarded. Right? So then the model is learning how because model want

37:40

to be lazy. Right? They want to they want to get the reward as fast as they can. So the construction of a good reward function is also very difficult

37:47

and requires skill set that a lot of Asian builders don't have right now right because you you need to understand what are potential vulnerab you have to think like a hacker right you have to think like if I was a if I was a hacker and I wanted to get the highest amount of reward while not doing any of the work is that possible and how how would

38:04

I do it um and and that's really why a lot of people don't attempt this right um the model training that we've done like 70B or we recently done a run on Quinc 3 235B. Um those require GPUs up to 256, right? You need a cluster of 200 or so GPUs. Um that's not easily

38:24

reservable. So it is it is still a very gated technology because access to hardware is difficult and access to good knowledge on how to create reward function is also difficult. For sure that makes sense. Reward function though is just like a eval right of

38:41

the response. But then yeah, but then you have to like encode it exactly to the use case that you're you're testing. Yeah. Yeah. It can't be a simple eval, right? Because your model will be actively working against you to try to

38:55

crack that eval. Unlike if you're just evaluating a model in production, the model doesn't know it's being evaluated, right? Are there humans in the loop when reward functions are written some cases?

39:06

Yeah. So, so that's called RHF which is uh reinforcement learning from human feedback which is the original way of how reinforcement learning was uh used in LLM is because we didn't have the verifiable rewards there wasn't GRPO um the original algorithm is called PO um and then that algorithm um either dependent on another machine learning model to tell you how well you did or

39:30

dependent on a human to tell you how well you did uh and this is still also possible in a gpppl essentially the human access the reward function right um it is basically f b f b b b b b b b b b b b b b b b b b bulletproof because the human is able to you know use human level intelligence to identify the good case and bad cases but again you're

39:48

heavily limited by the human bandwidth here right it's very timeconuming to do these kind of things so then maybe you don't have a that much knowledge of this I can't say that I do I've just done a little research heard a little bit about so Gro 4 came out recently Right. Are you familiar at all with how Grock 4 was trained compared to the other models? Do you have

40:11

So I'm not I I wouldn't say I'm an expert, but I know that they leverage a lot of post-training AR on it and I a good friend of mine is also on the Gro for team and then he was saying that it was mostly like on a post training efforts. Um but I think so Gro for heavy I think that's that's the model name.

40:30

That's the one that really performs well. Um the other models I think are on par with with the state-of-the-art methods, but obviously we don't know the parameter count. Um yeah, there there's some interesting things that they've done on the system prompt side, which I think many people have seen on the news

40:49

recently. Um but yeah, very interesting model nonetheless. Yeah. Yeah. I I just I had heard some rumblings that they use less human feedback compared to some of the other

41:02

models in their training and that allowed them to do automate more things maybe than was previously possible or you know in previous models maybe they use less human annotated data sets uh things like that and again I don't know much about it yeah one hypothesis that people are doing or were saying is that um quif which RL from AI feedback, right? Um,

41:29

right now like as I said to you, like human level intelligence is good at identifying bad results. Uh, state-of-the-art AI models is also okay at doing the same thing, right? And AI models don't have the bandwidth issue of humans. They they are also they're expensive, but they're not expensive as hiring somebody to do this, right? So,

41:47

uh, one one approach that people have been trying to do is letting an AI model to judge the output of the learn of the model as learning, right? using like cloud 4 or Gemini 2.5 Pro these closed source models to teach and then the teacher model acts as the reward model in this case and and this is a potential what Grock for did as well.

42:07

So if a user has a bunch of evals on their agents right and they have a bunch of different evals ones they wrote themselves like tool call correctness etc. And let's say they have volume too. They have like hundreds of thousands of data points that they're in this database of eval. Is that are they RLable at this point?

42:32

Yeah, definitely on their use case, but they have enough volume or they have does it make sense for someone at that stage to start doing some of this stuff beyond you know just using open AI for example? Yeah, th this is where I would ask a few more questions is well what is your EVA accuracy like right? Is it

42:51

good? Is it bad? If it's if it's like 99% sure go figure, right? Like as long as cost is not a consideration for you

42:57

then then sure the system works well. Um but if it's if it's not there and you want it to be there then RL can definitely be a be a use case here. Like RL is kind of those funny things where it feels weird saying but it is faster because AI models you train are smaller. is sometimes better because it's able to think in a more deterministic way and

43:16

it's also cheaper because the models are just smaller right um so it it is like you get the best of three worlds but um I think my suggestion to these people would be what what does your company actually need does your company actually need to save cost right now or do you want to move fast right because RL is really for a company who's already done

43:34

sprinting the initial cycle to get to product market fit and now they just want to improve their models to be even more reliable even more resilient and maybe save some money along the way. Yeah. Yeah. Yeah. You can often save like over 90% of the cost by doing this.

43:51

So, we do have a question in the chat that's kind of related from Jimmy. What RL implementation types or tools are there currently and how to choose choose? Yeah, choose them.

44:03

Okay. Um yeah, so that's a very loaded question. I I'll break down how you would uh go about this approach. So our implementation tools there um the most

44:14

popular developerf facing ones could be something like unsloth or axelottle. Um there's also the transformers TRL library. Uh it's called I think transformers reinforcement learning though that's why it's called TRL. Um

44:27

these are frameworks that are geared towards uh enduser developers to try out reinforcement learning. Um and then the model type I would typically recommend reasoning models because reasoning models you can actually uh see different reasoning traces and enforce different policies. It doesn't make sense for a

44:45

non-reasoning model to do this. U because a lot of times you the non-reasoning models performance is just not good. It will never get to remotely the correct answer and there's nothing to train on. Right? So reasoning models

44:56

like the Quinn 3 family, the llama 3 family, um and then using frameworks like onslaught, axelottle or TRL. Um yeah, like these are these are the ways to start. And then for model sizes, um a a good performant model is typically achievable in the 8 to 14 billion parameter range depending on how complex your task is. If her task is text only with very strong verifiable guarantees,

45:24

then uh those models typically work. Um but recently the smaller models like the Queen 3 4B and the 1.7B family has also been shown to work somewhat okay for things like data extraction or translation. Um where if the data extraction is very deterministic and

45:43

similar data every time with slightly variations you can get away with very very small models right but for things like agent tool calling I I would recommend anything a billion parameters and beyond for at this stage. Yeah. Dude, you know what we should do next time you come on, Andy? We should do we should we

46:01

should do some RL actually like Yeah, we can we can do like people can do synthetic. It's all fake, right? So, we can do synthetic reward everything like we just synthetic everything. That'd be pretty interesting because I

46:15

do think this is important for people like maybe a couple years from now in the builder community at least. Like we'll never be where here yet, you know? people, you know, but but also I think sometimes it's nice for people because a lot of the people that are watching this are either building AI tools, AI applications, AI agents, and they probably aren't to the

46:37

state yet where they need they need to consider RL, but they likely will get there, right? If things whatever they're building goes well, eventually they're going to try to ek out every bit of optimization, whether that's cost, quality, performance, those things matter, especially at scale. And ultimately having at least a baseline of like understanding of how it works could be helpful whether they're trying to do

47:01

it themselves or whether they turn to someone to help them actually try to do it right. They'll at least be able to talk the talk enough to un know what questions to ask when they're going through the process. So I think something like that could be beneficial.

47:13

And then selfishly I just want to see it because I don't know I don't know how to do it. I like osmosis too to do. Yeah, we could definitely show off the example on how to set it up. Yeah. Yeah. I mean, this is probably a good

47:28

point, too. Like, tell people a little bit about, you know, obviously you know some stuff about ML, but what do you what are you all doing at Osmosis and and I'll also pull up uh for those of you who want to kind of follow along. I think those are the right links. Yes. Yep, that's correct. So, Osmosis um we are a company that offers

47:45

reinforcement learning as a service. Um people can come to us with questions about their agents saying that hey I tried prompt engineering I tried using latest models it's just not working that well it's too slow um can I have something else that helps me out about this and then that's where we come in right so we would uh take a look at what

48:03

data do you have right now what use cases do you have right now is RL even appropriate because a lot of people have a misunderstanding of RL is it is not a metrical algorithm right there's a lot of limitations with RL and then if we see a fit here um Yeah, we would either offer you to use the platform directly if you wanted to learn yourself or we

48:21

can also help you create the models and host it for you as well. We are very agnostic towards how the models are hosted. So if you want to own the model weights and say that okay you have a you built your own model that's totally fine with us. We can send you the model weights as well. So we are essentially an abstraction of a post training team

48:38

and helping you to get a very performant model for very very little amount of money. Right? instead of hiring to the 3ML engineers, you can use us for uh you can use us for a partner and then we can help you do all this stuff.

48:52

Awesome. So, another question. I'd buy it. Agreed. Another question. So,

48:58

Jimmy asks, "I've bumped into Open Manis RL twice now. It's a little confusing. Any reviews on this from from you? I

49:05

know nothing about it, but Andy, have you heard about this?" Yeah. So, open manus RL is um I think it's led by a team at UIU, don't quote me on this, but it was led by a university initiative. Um these tools are academic research output. So, they

49:25

would not be production ready and the reason why I say that is because uh academic research papers and the frameworks that they produce often do not have optimization from the GPU side. Right? I can tell you one very simple example actually not simple but one very entry- level example where if you have a cluster of AGPUs

49:45

you're doing inferencing and training at the same time how do you resource schedule right um are you able because a lot of people what they do is they just dedicate two GPUs to doing the inferencing like generating the 100 answers and then six GPUs to then do the training right to update the model um that's one issue the second issue is the environments uh you have to sorry on the

50:06

first issue um because you're only able to use two out of eight or six out of eight GPUs, your utilization is quite low. So, you're wasting a lot of money doing this and the efficiency is not high because there's a lot of IO transmit that is um being bottlenecked by you know raw IO. Um so, how do you figure out solutions

50:23

to that? That's a more info engineering problem or ML infra engineering problem. And then the second part is the environment I talked about before. Uh you have so many tools that is all

50:34

different. you have so many environments, how do you make them talk to each other? Um, this needs to be very extensible, right? But research frameworks are very opinionated where they say this is how you use it and

50:46

there's no other way to use it, right? So the accessibility is also an issue, but it is a very good tool to help you understand how RL works, right? If you're interested in like, oh, I want to see um I don't think open man is affiliated with Madison in any way. They're not the same thing, but if you want to see like how an agent learns or how an agent is uh trained in a like

51:08

academic setting, very good resource to go into. Um yeah, so that that that's what I would say about like these frameworks in general. Cool.

51:20

Yeah, Open Manis and Manis are not related. They're two different projects. And I will say Manis and Open Manis' logos are not buttholes. One is like a click and the other one's a high five.

51:33

So it continues the theory that American model companies or projects always make it bubbles and the Chinese ones are not. So yeah, we got a whale, we got hands. Scoreboard. I need we need to calculate a scoreboard for this because every project we talk about

51:52

Yeah. Like every new one is like goes on a scoreboard. Is it a butthole? Is it not a butthole? butthole or not, you

51:59

know. And then and then we'll do and then we will do some reinforcement uh learning on a model so it can know if if the logo is a but hole or not. Do a classification. Yeah.

52:12

Awesome. Well, anything else, Andy, that you wanted to talk about today? I I feel like my head is about to explode, but I learned a lot. Hopefully you all, if you're watching this in the chat, if you have any questions, now's the time to ask before we uh Andy drops off here.

52:26

But we'll definitely have you have you on again. But anything else you want to say to anybody? Um yeah, that's all for for today.

52:33

Thanks, Professor. Yeah, Professor Andy in the house. So yeah, that was really cool.

52:40

Yeah, appreciate you coming on and sharing some of your knowledge and yeah, we'll definitely have you on again here in the future. Yeah, thank you for having me on and yeah, happy to be on anytime. Okay, cool. Take care, guys. Andy, see you Andy. See you.

52:54

Dude, I love that guy, dude. He's He's the coolest, dude. And he he can teach me in ways I don't feel dumb. Yeah. Yeah. I went to I went to eat dumplings

53:07

with Andy once and man, this is a great time. Y'all should get more friends like that. Everyone watching, you know, smart dude. Yeah. People you can learn from. That's

53:19

that's also made me think, you know, like if you're assembling all this data, we have all this data, too. Eventually, maybe we could send it over to Osmosis and they can make new models for everybody. Yeah. I mean, isn't that, right? Sorry to interrupt. You have to think that this stuff will get easier.

53:37

Yeah. It has to. It has to. It's it's you need you need an Andy to do this today.

53:43

Yeah. So, an Andy Andy is building an Andy for everybody. Exactly. He's trying to make it easier so you can do it. And yeah, that it is going to get easier. These things are

53:55

that are hard now in, you know, who knows the timeline, two years, five years, whatever. It's going to be a point-and-click SAS service, right? Send me your data, get a model instantly, like point and click and swap swap between, send half your traffic to the to the original one, half your traffic to this uh new model, compare the

54:14

results. It'll it'll be like feature flags, you know, like it'll be it will be like once the builders get here, we're going to make that happen. But we need Andy's Andy's need to be there to make otherwise we can't do it. Andy needs some some uh some more builders too

54:32

because imagine like you just have like a cron job running like just it's observing all your data coming in your evals and just like just like you know it's like when your database uh has new versions snapshot versions it's the same thing you're just like putting putting out models you know every like three hours just based on what's happening

54:51

dude. Yeah. And and hopefully Andy if you're watching this like I want to slowly incept you just make all this available through an API please. They're doing that already or they will be doing that I think. Yeah. And maybe they already have but

55:04

then it's like oh whether you know whether you're a company like MRA or you have a cloud product or you're you know a competitor it's like we just we have the data the the users have the data in our system. We make it point and click. You send it over. You get a new model you plug it in. You you swap the traffic

55:21

between it. you measure it, you do e do your evals on it, make sure it works, roll it out to everybody. I mean, there there's so many cool things you can do once this becomes easier. And I know the there are a lot of complexity. There's a

55:34

lot of complexity with all the hardware requirements. You got to know how to run, you know, run things on GPUs. Like I don't know how to do that. That's why you use osmosis or whatever. Exactly. But you but slowly they break

55:45

down the barriers and others like them break down the barriers. Then it becomes accessible to builders like you and me and everyone else that's probably watching this. Yeah. Yeah. Cuz what if there's just a day where like you know how people used to

55:57

always ask us like oh you guys going to fine tune your models and during YC and the answer would be like oh yeah it just automatically happens you know. Yeah exactly that that's the dream. That's the new dream right is that it just it can just happen or it's a button click rather than a whole complex

56:15

process. Because our answer was always if you're thinking about fine-tuning, especially if you're in YC, you probably shouldn't be doing it unless you're a company like Osmosis that's doing the like the reinforcement learning or fine-tuning or whatever. But for the most part, you should get, you know, you should pay the for the best model to get a good prototype to

56:35

get get people actually using it with good accuracy. Make sure you found a product that people actually want. And then you care about cost. then you care about scalability, performance, all those things become more important once

56:48

you know you you have something that people actually want. Yeah, pretty cool. Pretty goddamn cool. So question from the before we get into

57:00

AI news from spacework. Can you make some more examples or docs about implementing human in the loop approved decline tool calls with master client SDK in AI SDK use chat? It's a good idea. I think we've we have that's on

57:15

the list somewhere, but we 100%. We like to hear, you know, thanks for sharing it because that the more we hear about it, the better we, you know, kind of up the priority as more people are asking for things. So Paul, if you're Paul, if you're watching, we need some docs.

57:33

Things are getting better intern like things are getting better on this. Yeah, just the internals of making it happen are we're trying to make like we're trying to make like MRO work well with AIS SDK, React, freaking AGUI assistant UI and we're going to be investing in like little integrations with the partners. So AGUI already did it. They own the integration. We might

58:00

have to own certain integrations based on who we're integrating with, unfortunately. Uh, but it's like a big it's a big thing. So stay tuned literally on our roadap. It's like

58:11

literally right here on this board that I'm looking at right there. It's right there. Anyway, yeah. All right. Should we talk some AI news, dude? Let's do it. But can we talk some

58:23

first? Or maybe it's part of the news, actually. No, I mean, we should definitely talk some That that's that's part of the appeal of the show. If you're watching this, it's not I want to make sure it's not on the uh

58:35

agenda already. Yeah. Look at the agenda. You tell me if it's not on there and then you tee it up if

58:42

Let me see. Let me see. Let me see. Let me see. Um yep, it's already on there. So, and I

58:47

think it's the first topic. So, which one? Okay. All right. Yeah, the I just I just labeled it on the

58:52

agenda. Wind surf. What the dude? I switched to cursor at such a

58:59

good time, man. It's crazy, dude. I Okay, so for those of you, if you follow AI news at all, you're on X, you've probably heard this, right? You've probably seen this. I don't know if it's even drama. It's just this. It's

59:12

just wild. It's like a wilder drama. It is drama for sure. So, originally, so let me set the stage

59:18

and then we'll we'll share some I mean, you can do your research. You can share some links if you want to want. But so, originally, OpenAI was going to purchase I think it was like 40. I'm making up

59:28

some of these numbers. Like I'm guessing I don't remember exactly. I think they were going to purchase like 49% of Windserve if I remember right in chat.

59:34

Correct me if I'm wrong on any of these things. Like we'll get we'll try to try to be accurate, but mostly we're just want to we want to BS about this stuff. Um okay. So they were OpenAI was going to acquire Windsor for 49% of the

59:47

company. They were basically going to acquire rather than try to acquire the whole company, they were going to acquire a large percentage of it, get access to everything and probably use it for internal purposes, right? They wanted they wanted Windsor but they didn't want to maybe because of regulation didn't want to buy the whole company. Then

1:00:03

it seems like there was some tension with because Microsoft owns so much of OpenAI. Windsurf was a little skeptical and of basically letting Microsoft have access to everything that they have. You know Microsoft has VS Code. If Microsoft then gets access to Windsurf they could just roll it in VS Code. And why does

1:00:21

Windsurf exist? So I think Windsorf was rightfully maybe a little hesitant about that. They wanted some kind of carveout that Microsoft had to agree to that OpenAI wouldn't have to share the some of the proprietary stuff. That that's kind of my read on it. And then Microsoft didn't

1:00:39

agree or they let let it pass. They never agreed and so the deal kind of like fell apart. That's that's my that's my take. So what a tangled web Open AI has weaved.

1:00:51

Seriously, dude. I mean, it's like it's always some some kind of open AI drama. But then Windsurf is all of a sudden the CEO and leadership team, not Windsurf the company, not all the employees, not the investors, although the maybe the investors would have got paid a little by the sounds of it, but Google

1:01:09

basically said, "Oh, we're just going to hire your whole leadership team. We're not going to buy the company. We don't want the company, but we'll get we'll buy a license to the technology." So, Google has a license now to WinSurf

1:01:21

and they have the CEO and leadership team and maybe the investors got paid something. I I think maybe the investors got some kind of carve out. I don't know. But from what I've seen, the employees basically didn't get much or anything.

1:01:34

If you if you haven't invested yet your one-year vest originally, you weren't getting any compensation from this. And then only people who had started their vesting cycle there was no like acceleration on vesting date at all. And I don't think anyone got paid at all, you know. Yeah. And and maybe they got like a bonus or something, right? Like that if you but all the all the employees that

1:01:59

weren't like the key employees of this leadership team deal stuck basically the CEO is like peace out. You take the company. We're gonna put this other like head of business in charge and be the new interim CEO. Peace out. I'm out. Like, okay. So, first of all,

1:02:17

I don't know you, dude, but come on. That that seems you just left your whole team. Let left left you like that's like a captain like the sink the ship is sinking and the captain gets off and leaves the crew on the ship, you know?

1:02:29

Dude, this is exactly like that. It's like that is the opposite of what you should be doing in my opinion. But you know you you do you. So now Cognition has come in that's the team

1:02:41

behind Devon and has announced a plan to acquire the remaining product and company. So everything that's left which includes as far as I know all the revenue right they have like 86 million at least in like forward facing ARR from what I understood and cognition has come which I don't even know how well capitalized Cognition is but they must

1:03:00

be doing must have quite a bit of funding to be able to pull this off. I don't no terms of the deal have leaked, I don't think. So, I don't know what it is. I'm sure it's a lot of stock if I were to imagine. Um, but the nice thing

1:03:13

of what they said is, and I'll share this tweet maybe. Uh, give me a second here. The nice thing is they're at least uh doing right by the employees a little bit more. So, Cognition signed a definitive

1:03:32

agreement to acquire Windsurf. Um, so 100% of Windsurf employees will participate financially. Basically, what I had I read a summary and essentially it accelerates their vesting. Um, every

1:03:47

everyone basically gets, you know, every employees coming over. So, there's a lot of good things if you're a Winerf employee. Maybe you got a little bit of a bonus from the other thing. maybe depending on how long you've been with the company and then this uh hope gets

1:04:01

you to accelerate your vesting. So you're now part of a company that maybe can continue to lead and invest in Windsurf. I don't know the endgame here for Cognition though. Like they have Devon and now they're going to have

1:04:13

Windsurf. Maybe they can maybe Devon becomes like Cursor has background agents. Maybe it's like Windsurf background agents run Devon, you know. Yeah, maybe. Well, the also Winster's not poor. They like Google paid them a

1:04:28

license of like $3 billion license fee. So like de in some ways Devon just came up on the on their balance sheet like fat. But the terms haven't been discussed actually at all. Well, at least known to us because it's agreed in principle, right? They have a definitive agreement.

1:04:45

Yeah. But we don't know what the the agreement is yet. Yeah. And and again that that could

1:04:51

definitely you know could could fall through. You never know how these things. So you know Amcar I agree but why Devin plus Windsurf maybe? Yeah maybe that's a possibility.

1:05:06

Ward Peters I don't know what picture that is Ward but I don't think that's you. Ward was punk rock back in the day I guess. All right. The different hairstyle. I like it. Money. Yeah money.

1:05:16

So yeah that was the first news. any I don't know there's just a wild story like over the last week how but here's the thing that I kind of take away from it that coding agents are the future idees are not why do you think people like they they hired the wind surf execs okay cool but like I'm thinking like why' they do that to work on Gemini right that's because Gemini is going to have a coding agent

1:05:43

too and it's just going to be the future like maybe that's like they don't care about wind surf for the IDE. Yeah, maybe they want to learn from Windsurf and then build some kind of what what would be like the new IDE of the future. Maybe that's what they want to try. I don't know that they can do it, but maybe they want to reimagine

1:06:01

what a coding agent should be. Yeah, like the new IDE is not a freaking text editor. Maybe, but maybe parts of it are and stuff, but it's just not like straight up like, you know, the code's maybe not as important potentially. Yeah, maybe be interesting to see. Maybe

1:06:21

the IDE is is maybe the IDE is like this the command line like you still use it but it's like you only use it when you have to use it, right? There's less and less you don't need to use it as much as you used to. Yeah, money being moved around in this space, right? So, yeah, there's going to be some change in like the dynamic.

1:06:44

It's exciting. It is. All right. Next news topic. We talked about this briefly

1:06:51

with with Andy and Gro 4 came out last week. That was a pretty big release. Well, first of all, Grock Grock uh in on X was having some, you know, some questionable comments being shared. you know, Grock was for

1:07:09

those that don't know, you can basically ask Grock to comment on things or provide fact check things live on X and it'll just respond and use, you know, some one of the Grock models under the hood to answer it and it was kind of going off the rails like it was saying some pretty bad stuff. Definitely uh not

1:07:28

not things I want to share, but yeah, it was it was the guardrails were not intact. I'll just say that. Um, so but then a few days late and then X the CEO of X I think unrelated stepped down. So that was kind of a big deal but X and XAI are I guess technically separate but then I think XAI has now

1:07:49

purchased acquired X or they merged or something. So that happened and then Gro 4 came out and it was a big release and by a lot of the benchmarks Gro 4 heavy is I would consider it now the it is the state-of-the-art Dude, isn't it crazy, dude? It's so crazy, dude. Like, they're just restructuring their organization to

1:08:12

become like a model company, dude. Everyone, you got to look through the news and to see the strategic moves that are happening. There are so much chess being played here. That's so sick because then how else would you raise

1:08:26

another like more fundraising, right? And like X X AI is like not X. Touche, dude. Honestly, it's great. Also, Super Grock is super cool.

1:08:39

Have you? I've been playing with it. Yeah. Yeah. It's been fun, dude. And I also saw that they released

1:08:46

like um companions or whatever. Really? You can make little like Yeah, you can make like companions or something. They could be like little

1:08:57

avatars and So, I because I I played around with with Superg. Are you You're not on like the the plan that lets you use heavy though, are you? Um, there's like a $300 There's a $300 a month plan. No, I'm not on that. I just like Grock 4 expert or something.

1:09:14

Yeah, but yeah, I think all the benchmarks were run on this Gro 4 heavy and for the most part Gro 4 outperformed almost all the other models. Uh, you know, I'm I'm summarizing a bit here. Like look at look at the benchmarks. You can see like

1:09:33

they but they I would consider it it's probably considered the state-of-the-art. I don't think it's like meaningfully better in anything. But the fact that you they were late to the game and then now they have the what it is considered at least right now to be the state-of-the-art model is pretty impressive because Groc came out way

1:09:51

after obviously OpenAI and Chad GBT you know launched and uh after anthropic and Claude and so now that they're the leader is pretty impressive. I don't think their lead's going to last long. I'm sure there's going to be other models you know that come out and surpass it. I also think maybe we're past the age of benchmarks because the

1:10:11

problem with all these benchmarks is they're there long enough eventually it seems like people are talking about these benchmarks that th that fact gets in the training data of the new models and so the benchmarks eventually become less at least older ones become less important because the models get trained on how to solve those specific benchmarks

1:10:29

but it is still interesting it's very cool super cool and it's interesting that gro 4 with Python Plus the internet scores very well. Um, where are the JavaScript models at, dude? No one be No one's having JavaScript plus internet.

1:10:50

All right. Seems like there's an opportunity, maybe. All right. Should we move along? Let's do it. So, let's talk about AI browsers. So,

1:11:00

two things here. So, OpenAI is reportedly releasing a model or a browser. Sorry. They already have many models, but

1:11:13

they're going to release a browser maybe. But I do think it's kind of like it's an announcement, but it's kind of like, okay, cool. You announced it, but where's the you didn't have anything to show for it? You just said we're going to do this. So, I wonder if it was they wanted to

1:11:31

say, you know, as this kind of alludes to, which we're going to talk about next, Perplexity has released Comet, which is an AI browser. And I think OpenAI wanted to has maybe been working on it or whatever, but they wanted to make sure that they're not forgotten or they want to be in the game. And we get a as a comment from Goose. Goose is here. Goose is back, by the

1:11:56

way. Goose is back. Goose is loose. The goose is loose.

1:12:02

I had beards with goose on Saturday. All right. Yeah.

1:12:08

Hanging out with the goose man. All right. Yeah. Everyone is building AI browsers though.

1:12:13

Yeah. So, I mean it's kind of interesting though. You know, OpenAI announced it. I

1:12:21

They have a lot of data. I'm curious if that's going to be like a separate app. Is that going to live inside their current app. Are they going to launch a browser that's a completely separate app? I don't know. That's kind of

1:12:33

interesting. Uh and and for those of you that didn't know, having a beer with Obby, it was awesome. I agree. Normally, it is awesome. Uh yeah, but Comet by Perplexity has

1:12:47

also been released. Yeah, but you know the funny thing that that one's not surprising at all. you know, Perplexity's goal was essentially like they're like on the they're on their like road map essentially. So, here's a

1:13:03

Do you think Sam Sorry, do you think Sam Almond just says like, "Oh, like are you gonna do AI browsing?" And he's just like, "Yes." And then people write news about it and then they're like, "Oh, do you think you're gonna make a robot?" He's like, "Maybe someday." Then

1:13:15

they like write about maybe. Yeah, maybe. Like, and who know?

1:13:20

And then and then someone inside OpenAI was like, "Shit, I guess we're gonna have to build this." Oh no. That's how the road map gets created. They're like, "Oh no."

1:13:32

Yeah. And who knows? Maybe we'll find out they've been working on it for a while. Maybe we'll find out they're now like scrambling to ship it.

1:13:38

Yeah. Uh I think, you know, I think obviously OpenAI is well positioned because they have such a consumer audience with chat GBT. They are It's actually I don't know. I think it's very hard to ship a good browser that gets mass adoption. I

1:13:50

think that's going to be very hard. I also think that um you know my other question is when do browsers become less important? you know, it's like I use it I use a browser to do something specific like I want to, you know, schedule a, you know, book a restaurant, right, for a reservation at a restaurant. Like I go to the restaurant's website and I click

1:14:14

on it or I use an app, right? But like I'm using a browser for to accomplish some task typically. That's a lot of my uses of browsers. Of course, like sometimes I'm doing research, but again, like maybe I can use chat GBT or

1:14:26

something else to do that entry level research. So my question is, does your use of a browser go down? Does it decrease when you have agents that will be able to either book that reservation for you, book that flight for you, do the do the initial research and you just have to like maybe go to sources. So I

1:14:47

do wonder if like people's browser usage actually decreases over time. And so I'm wondering if if a browser is not a maybe like the command line, not in like an IDE, is it a dying technology? meaning not that it's going to go away, but it's just going to become less important. And I don't know that that answer, but it's something to think about.

1:15:05

That's true cuz like there but then you know the benefits of a new browser are like you can have like built-in tools and stuff that could be new standards um for purchasing and everything. Maybe the runtime can change. I'm definitely in the market. So I haven't tried out Comet. I don't know if

1:15:25

it's actually public access or if you have to get an invite. I'm not sure of that. But let me know if you know in the the chat. Can can I just go sign up for comment? Does anyone know? But I am in

1:15:35

the market. I'm in the market for a browser. So my chrome Chrome started being really slow.

1:15:40

So then I tried to go to Safari and I was like I'm just going to try Safari and then I'm I'm now back to Chrome. I just I couldn't do it. For those of you that like Safari, I don't know how you can do it. It just

1:15:54

too too many things just didn't render right or didn't look right on Safari. Like you can't use can't use some tools on Safari. Like come on. So anyways, I am in the market for a new browser. I'm going to be trying out comment and you

1:16:07

know any any of the others. I'm going to try to pick a new browser and I'm hoping it's not still stay with Chrome. But speaking of here, let me I could I could show people like the All right. So, it's still browsing the web, right? But

1:16:38

then you have extra tools. So, now you have client side tools that you can then interface with. And then, yeah, this is dope, right? Like, you know, you have a little AI assistant anywhere on the web. I wonder

1:16:52

how that actually works, though. You can have links in into stuff. Autofills is a cool feature.

1:17:06

YouTube, that's dope. Amazon. So, yeah, building tools. According to Vorpas, you

1:17:13

need a max subscription and you need to get an invite code. So for Perplexity Max, for you need apparently need Perplexity Max plus you need an invite code to get access to comment. So if there's anybody watching this that has an invite code. This is Chromium though. Like this is

1:17:30

going to be Chrome plus all their That's what it is. Which is cool. I'm not against it. That's tight. Will I use

1:17:37

it? If if you have an invite code, I will I will sell you ad space on this live stream to get an invite code. I'm I'm kidding. Trade invite codes for We'll just pump whatever you're selling.

1:17:49

Yeah. What What are you selling? Give me an invite code and I will I will talk about it for at least 60 seconds.

1:17:55

Yeah, for sure. Just give me one of those. I'll talk about it for 61 seconds. All right. There's a competition. Who's

1:18:02

Who Who do you want to talk about it? Send one of us your invite code. All right. uh related though you know you

1:18:09

say like chromium because it is kind of just like right built on top of that there's another um tweet that I found interesting so I'll share this this one is you know Swix tweeted out about a week ago Chrome 138 plus ships Gemini Nano for every user so basically if you're using Chrome you have a local LLM them

1:18:38

right in your browser. So you could probably again it's it's nano like I don't know the performance of it but it's very small and like looks like you still need to turn on a flag for now. So you can obviously go here you can see how you can maybe turn on the you can set it up and turn on the flag.

1:19:01

So that is interesting. Something very interesting that now every every person that uses Chrome has an LLM on their, you know, on their system. It's basically easily available. Probably makes extensions a lot easier to build, too. Can like get just models

1:19:20

there. Yeah, it's kind of interesting. What? Yeah, that's a great point. I didn't even think about that. But if you're Do

1:19:25

you have access to that when you're building a Chrome extension? because now you can tap into three years three years to expose it but you know eventually they'll have it there for sure. Yeah in the browser like I guess when you're making like scripts or something locally like but or like in a client side use

1:19:44

case. Okay. I mean it's cool though. Yeah.

1:19:50

You're still going to have to server for the tool calls if you have them over there. So yeah, I don't know. We'll see what happens. And again, now are you talking about client side tools? I feel like this was

1:20:03

well scripted. Like is this is flowing really well. Maybe maybe to you all listening it's not, but it was not scripted. This is dumb luck that these stories kind of weave together in a somewhat cohesive way. But speaking of

1:20:16

client side tools, all right, we added remote DOM support to MCP UI. So now you can render UI from MCP servers in their own style and look and feel. Wow.

1:20:34

So it's kind of neat, kind of cool. It's the idea of like this client side UI and maybe you have you can have your own styles or your own stylesheet or your own preferences. So maybe, you know, when I when you think about like what Chrome extensions could do or what more generative UI could do, some of this stuff starts to starts to come together.

1:20:57

Yeah, that's cool. It's kind of what we wanted to do with our remember with our Gatsby plugins back in the day, you can have like CMS plugins to then render widgets and Yeah, it's all the same And then last the last thing before we move on to the next last bit of news and this one I just saw today. Haven't really looked at it but I'm going to share it anyways. Introducing MCP valves

1:21:26

an eval library for MCP servers. That's cool. Okay, cool.

1:21:32

That's pretty awesome. So Kyle is at browser base I think. Yeah, still over there. So that's cool. Kyle getting a lot of I'll give you a

1:21:44

like. All right. But yeah, so if you're looking for MCP emails or something to maybe check out and look into. All right, dude. That's the news. Hey

1:21:57

Chad, if you're listening, any news we missed, anything we should cover, let us know. If you are just tuning in, this is AI agents hour. We do this every week. We

1:22:08

did some updates. We talked to Professor Andy and learned some ML. We talked reinforcement learning and how, you know, how to go about getting started and thinking about that, what it is, how it compares to fine-tuning, all that kind of stuff. Then we did all the AI news. We talked about windsurf gro and a

1:22:26

whole bunch of other other things. AI browsers and much more. Yeah. What a good day. Yeah. And we're not even we're not done yet. So, I'm so glad I I switched over

1:22:41

to to cursor just in time. It was the It was the right timing. Well, I mean, gets acquired, right? There's always a period after that. You should probably not use their products until

1:22:53

they figure it all out, I mean. Yeah, potentially. Potentially. I mean, you were because you were a cursor user, then you switched to Windsurf, and I was

1:23:05

the same way. Like, I switched to Windsurf for a while. I didn't stay on Windsurf as long as you did. And then we

1:23:10

we eventually found our way back to Cursor. Oh, my long lost love cursor. Yeah. But now I'm like cursor plus Claude code

1:23:21

in my opinion. That's my That's my jam. Coding agents, dude.

1:23:27

I had a I had a chance to use a Gemini CLI as well. Shit's tight, dude. I'm starting to not believe in IDEIDes anymore. That's what mean my new thing. I don't know if I believe in IDE's

1:23:39

anymore except I use them of course. But yeah. Yeah. The the the age of the IDE is

1:23:46

coming to an end. Yeah, dude. I'm hipster right now. No ID.

1:23:52

If you're just if you're using pure cloud code, you're like super hipster right now. Yeah, I'm not I'm not quite there. But I do I do like using it in the CLI. Like my mindset's different when I use it in

1:24:05

the CLI because I'm like I'm just gonna send this thing off and I'll come back and check on it in a little bit and maybe like continue it on, but it it feels when it's away from my IDE, I feel like I'm in a different mode. Like I'm just like task master, just send off some tasks and and let it go. And then, you know, when it's in my ID, I feel like it's more active. Like

1:24:23

I'm actively building this with you. So it kind of depends like it changes my mood if I do I want to build it or do I want you to just build it and then I'll come fix it later. 100% 100%. come to our workshop on Thursday.

1:24:34

Yeah, we we'll talk more about it. All right, so let's talk about CLI templates. Cool. So, this is a sneak peek. People that are watching this, you get to see it

1:24:46

before it's even available. So, you know, bonus for watching hour and 24 minutes into the live stream, but it should be available tomorrow with the release, assuming nothing drastic happens in the next, you know, 12 hours. But I'll give you the quick little uh demo. We'll talk about it and then you will uh hopefully you'll be able to see

1:25:11

what you know what's coming. So first of all, we talked about this briefly in the MRA updates, but we have this new templates page. You can go there now. You can, you know, click on it. You can view it on GitHub. You can clone it. That's nice. It works.

1:25:30

But we wanted to make it even easier because ultimately if you there's very often if you're learning this stuff, you want to be able to like try things out, see how the code was written, see what it does, test it. We want to just make that as seamless as possible for you to look at a template, run it locally, try

1:25:48

it out, and then see, you know, see if you can learn from it, if you can use it. Maybe you just copy the code directly or edit the code, but ultimately just give you a really good getting started experience. And so with that, let's do just a real quick demo of something that's coming. It's

1:26:05

not this is not the end version. It'll be slightly different tomorrow. You'll see that not all the templates are there. But you can run, don't worry about the first

1:26:16

part of the command. This is just because I'm running it locally. All that. Imagine this just said the word

1:26:21

MRA if you use our CLI. But you can run master create and just pass in a template flag. And then it gives you the same templates that are available online will be here.

1:26:33

We don't have all of them available yet, but it tells you a little bit about it. How many agents, how many tools, how many workflows. If it had an MCP server, it would show that. If it had an agent network, it would show that. So,

1:26:46

you can see how we we'll continue to expand this to really try to provide templates that one are usable. like you might actually like take this as your starting point and use it, but also are like good for education if you want to see how like a simple version of this is built. It can provide like a good starting spot for you to learn from or you to kind of branch off of and start using. So if we wanted to, you know, use

1:27:09

let's say the browsing agent, I give it a name. This is going to go ahead and do the clone from GitHub for me. This will take a second.

1:27:30

It's taking its time, but the ultimate goal here is just really like ease of getting started. And so you on the the templates page for on the master website will provide you the commands. So you can just run it directly. So it says it's it's been

1:27:49

cloned. It was installed. So I could cd into browsing agent. Now the one thing

1:27:56

that you will have to do is and I can open this up in cursor. You will need to update you can't see my cursor screen. Let me show that.

1:28:12

I'll give you a couple clicks zoom here. But you'll you'll need to actually update your environment file. So you'll need to in this case, this uses OpenAI, so I'll have to get an OpenAI key. This browsing example uses browserbased. So you'll need a browserbased project and API key.

1:28:30

But then you just update that. You can run, you know, master dev and you can have it all those agents and workflows and anything else that comes with the template available right in the playground. One more quick thing that you can do if you do know and this will be again on the Maestro website. If you do know the

1:28:54

name of the template, you can just pass that in directly. So it could just be like deep research and it will automatically know that you're referring to the deep research template that's available. It'll ask you to give a project name and it does the same thing, right? It just clones it, installs it, and then all you got to do

1:29:13

is update some environment variables and you're off and running. So, yeah, coming coming to you soon. Available hopefully hopefully if all goes well tomorrow, so you can start uh testing out Maestra templates much more easily than before.

1:29:33

That's it. That's that's the that's the whole segment. Any any comments on that? I think it's cool. I think there's

1:29:40

There's more to do there after we do it to make it even cooler, but it's Yeah, it's just it's a good starting spot. Yeah, we're getting we get a lot of questions on how to do things. So, these templates are going to help because they show like this is how you do Yeah. And I and I always look at

1:29:57

go down, you know. Yeah. And and I look at like okay so right now we're curating all these templates and we we'll always have like our curated templates but maybe it becomes possible for others to like share a template between their team or something right like you can maybe we don't have to be the arbiters of all the templates. So maybe at some point that becomes possible. Not today, but

1:30:21

Oh, there's a funny enough at the same time I was saying it. And says, "Can you create your own template?" Maybe, maybe. Maybe I'd be, you know, not officially supported today, but maybe someday we'll

1:30:34

make it so you can just pass in a URL or something, like a GitHub URL or something, and you can just use a public GitHub repo as a template or something. I I could see that being a nice to have feature. So, it's a good idea.

1:30:52

All right. So, I did want to do one more thing because I just thought it'd be fun. So, wanted to end on something else. This one probably won't take very long, but a while back we we launched Master

1:31:06

Cloud in public beta and we did all these like funny little 8-second videos. So, I I made the joke that, you know, it's there used to be like eight minute abs. That was like you got people want that quick reward. So, in the 90s or the 2000s or whatever it was, there's like

1:31:23

eight minute abs. That was like the big thing. And now we're in the age of 8-second ads. Like you can really

1:31:29

quickly and easily build like an 8-second ad video. And that's all the people's attention spans are anyways really is like you know if you people that are consuming Tik Tok and YouTube shorts and all this they're just watching like five to 10 seconds of a video and then they're going on to the next thing right it's like those quick short things so we built a whole bunch

1:31:46

of like 8second ads using D3 and I thought it'd be cool for people to maybe one we can kind of brainstorm how this kind of thing could be used especially when tools like this are available through APIs because I think you can build some really cool things with agents but also just seeing how these kind of things are made. So maybe the first thing is I'll share

1:32:10

can share the video here and you can see all the funny videos that we uh we shared. Let me find it. All right.

1:32:35

So hopefully you can all hear this when I I'll play it. You can You can just deploy things. You can just deploy things.

1:32:57

You can just deploy things like this one. You can just deploy things. The robot was crying. You can just deploy things.

1:33:16

It's always funny that sometimes it the AI just adds extra laughs in the middle of it. You'll see that a couple times. You can just deploy things.

1:33:29

You can just deploy things. Oh yeah, this one was This one was the coolest one. You can just deploy things.

1:33:45

You can just deploy things. I think everyone gets the idea. There's only a few more left, so we'll watch. You can just deploy things.

1:33:58

The fake NBA MLB team that I created with the hat. You can just deploy things. The whole idea was how can you get as many possible ways to get a robot into the clouds?

1:34:17

You can just deploy things. Woohoo. And there it is. Yeah. So, there's I

1:34:28

don't know. I think like a baker's dozen. I think there's like 13 of those or something. Um,

1:34:34

and yeah, it was the the whole idea was how do you how do we build a whole bunch of just like funny things that people could share and and you know, talk about? And we got some some pretty good feedback around like, hey, how did you build those? Like that was the most common question. So, I figured we'd show I'd show real

1:34:52

quick how I did it. So, if any of you wanted to build videos like that, we could you could obviously know how to do it. A few uh you know, I'm Jed says he loves it. Cool. Thanks. Couple questions

1:35:07

before we go into actually the how. Not necessarily related, but maybe we should answer some. What thing that so is one kitchen sink template that could demonstrate integration and use cases from the client side of things? So maybe a more

1:35:24

advanced kitchen sink template. It's kind of a good idea. That is a good idea. Yeah, I mean we need if you have other template ideas, please share them. You

1:35:35

know, we obviously we came up with ones based on what we've heard from people on Discord. um people you know asking us about in our slacks that we share with a lot of our customers. So we we try to get a good baseline of like simple things that you could like learn from but we are looking for you know always new ideas on what other templates would people want what would be really useful for people.

1:36:01

All right. So, with that, let's see if I can pull up uh V3 real quick, and I'll just show you maybe in, you know, 10 minutes how how we did that. Like, all right. So, for those of you V3 is

1:36:22

not a great product from like how they market it and design it, but it is a really cool tool. So, you need to have Google Flow and you need to have like I have an Ultra subscription and I have so many credits, right? So, Obby, think about something you might want to see. A if you have a picture, we can animate a picture if you

1:36:39

have something like that. Otherwise, what you want to make a video of. So, I'll like I'll show this is the one the project that I was using for all of these robots getting launched into space.

1:36:52

And like I'll go to some that I didn't. These are all a lot that I didn't. If I go further down, I'll get the ones that I actually, you know, did use. Like there's the ones that I did use. You can see I had to do multiple iterations. I

1:37:04

have a lot that we didn't end up using. Like this one here, I think it's kind of kind of fun. You can just deploy things.

1:37:17

That is a good one. But you can see the prompt. So this one says, "A gorilla scientist in a lab coat is standing next to a cute robot on a rocket launch platform. The cute robot has a rocket attached to his back. The

1:37:28

gorilla says, "You can just deploy things." As he pushes a red button on a remote in his hand. The rocket takes off and the robot flies into the sky screaming wildly. The clouds form the word mastra. So, you can see like it was

1:37:40

pretty detailed for a prompt, but it wasn't that like it wasn't that long, right? Like that's like a couple sentences and I was able to get this. You can just deploy things.

1:37:55

And so that's roughly how V3 works. There's a couple tips. If you are going to use it, it doesn't default to V3. They don't want you to use it. You got to like click on it and you have to like select

1:38:07

you want V3 with audio. So just you can either have you have so many like fast ones which is just like reduce it more quickly. Quality is a little bit slower and uh I don't know if it's actually like better quality or not. I'm not actually sure on that. But

1:38:24

just make sure you select V3, how many outputs you want. You, you know, I selected two, so I have two options to choose from, but then I'd sometimes have to rerun it. I just didn't want to run four and burn a lot of credits because it does uh it is pretty expensive. I think they reduced the cost. It used to be 100. So I think

1:38:41

they've like red silently, like I haven't seen any news of that, but it used to always cost me a hundred and now it's 20. So, I think they've given you more generations over the last couple weeks, which is kind of cool. So, one other thing just is like some fun things I've been doing. My daughter has like these she like I took like a

1:39:02

coloring page for my daughter. Maybe I I'll show this one here. So, she colored this, right? And now with V3 relatively recently, you can do So, you can't use

1:39:14

V3 for ingredients to video. Ingredients are basically like you can upload images that are ingredients, but you can use V3 for frames and a frame can be the same as an ingredient. I don't know what the difference is. The only difference is you can use V3 if you

1:39:31

go frames to video. So you can the thing is you can actually use multiple frames. So you can get it from like you want it to start here and end here and then it'll kind of like fill in the middle or you can just have it like a starting frame and you can get it to do things like you know I I said I wanted this

1:39:48

person to like animate pick up the basketball kind of work. That's pretty sick. So my daughter colored my daughter kind of built with a little help from her grandma made made this drawing, right?

1:40:07

Whoa. Is that game animation style right there? So it's like obviously she can't make an animation, but like she can make a drawing, right? She's three and a half and then that can turn into an animation

1:40:21

pretty easily. So you can do some really cool stuff with it. You can be pretty creative. And that's just using frames giving it a prompt. Uh, and then you can

1:40:28

actually open things up in like the scene builder maybe if it works. I think you have to actually add something to a scene. So, let's just start a new project and we'll just try something.

1:40:46

Do you have Do you have an image? What What image should I use? H um let me think of something. Use the Python ships here.

1:41:01

Yes. Yes. Slack me that. Yeah. Let's send the Python trains TypeScript

1:41:08

ships image. Yeah. Let's let's see if we can animate that thing. Animate it. Yeah,

1:41:17

that's going to be and if all goes well, we'll tweet this thing out at the end. Yeah, that's a Yeah. Okay. All right. So, let me download that.

1:41:31

So, let's see here. That's not okay. I agree. I have Okay.

1:41:52

Okay. Let's see. Do Can I get it all in? Got to do a little cropping.

1:42:05

All right. So, we crop and save that thing. I hope Daniel's watching this because I made Daniel dance at one point. Picture of him in there. Just a random picture

1:42:17

of Daniel. All right. So, f first frame. So, again, I could add a second frame if I wanted if I knew where I wanted it to

1:42:23

go. But if we just start with a frame, we make sure we select V3. What do we want it to do? The Python train arrives at the yard

1:42:37

and the TypeScript bots are loaded or something. I don't know. Yeah, I I was thinking the TypeScript ship has to like destroy the Python train somehow, right? But the train is like delivering agents,

1:42:55

I guess, you know, like LMS. The Python train arrives at the yard and the train loads the the crates of robots onto the TS ship. Yeah, something like that. Yeah. See how it goes.

1:43:17

All right. So, this will, you know, take a few minutes, but ultimately, you can see it was pretty easy. Like, this was a pretty short prompt. I don't know what we'll

1:43:30

get. You know, it's like rolling the dice with this thing, right? You never know. Sometimes it's

1:43:36

uh it's pretty cool. So, let's answer some questions while this is running because I think it's a good a good time. All right. So Amjad says, "Can I use self-hosted Maestra as my main server

1:43:48

with custom routes and logic?" Yeah. Yeah. Yes. That that's one of the reasons we

1:43:54

really wanted to build it like we consider Ma a framework because you can run it as your entire backend meaning there's the ability of custom routes in Ma. Uh so you can have your own logic. We have middleware so you can basically run it as as a backend to your application if you want. It's all like you know

1:44:10

bundle as a hono server, right? So, it's like, yeah. Yeah, you can just do it.

1:44:19

Um, car I don't know what this was in reference to as a while back, but let's build a coding agent with MRA. Maybe that was uh for the videos. I don't know. Cool. And what a time to live. I agree.

1:44:36

What a time to have enough money for credits. Yeah. What a time to what a time to buy AI credits. Yeah, we need to get sponsored by VO3.

1:44:47

Honestly, we need some sponsorships, dude. Yeah, we need we need we need If you are a company that sells energy drinks or makes energy drinks, come on. We We're We're waiting. All right. Should we see what we got? Energy drinks.

1:45:07

What's that? Yeah, I was saying we don't even need money. Just give us energy drinks. Yeah, just free energy drinks. We'll we

1:45:13

will we will promote it. All right, let's see what happens. All right. Holy

1:45:23

dude. That is so cool, dude. Dude, we're going to Dude, we are going to ship that one. That is good.

1:45:28

Shipping that one for sure. The second one is better, though. The second one is better though cuz the eyes don't look weird.

1:45:42

Yeah. And it goes and picks up I mean there's some extra robots pop up but it picks someone extra one up off the train. That's pretty cool.

1:45:48

Oh yeah. I like that. Look at the eyes on the other one.

1:45:54

That one's cool, too, though, dude. But look what we like that. That's That's a pretty cool little uh dude. We got to ship that. Yeah. Either one. I'm down. They both look tight. Well, I'll I'll send you one, you ship

1:46:08

one, and I'll send a different one, and then we'll see. People People can vote on the one they like the best. So, if you're watching this, go, you know, if you're watching this, go to uh follow Abby on X right there, Abby. Follow me, SM Thomas 3. We're gonna each tweet one of these and we'll

1:46:30

just base it on likes. Which one was better? So go ahead and and like the one that you uh think was was better out of these two generations. So we'll we'll do that here right after this uh when we end the live

1:46:41

stream. Yep. That was cool. Yeah, that was cool.

1:46:49

It's so e it's it's so easy to work on this stuff, but it's not like perfect, but like it's still like a thousand times better than what you would have to do yourself because you could just do things now. Yeah. I mean, could you imagine I can't imagine a world where I could have ever animated that Python and that ship.

1:47:07

Yeah. And we did it in 20 seconds. Yeah. I don't have the patience to learn things like that, you know.

1:47:14

Yeah. I would never. But if you have the ideas and the creativity now, you can just like you can just make things. That's

1:47:21

it's pretty cool. It's it is a wild time to be alive. As as Amar, you know, previously said, what a time to live. Agreed. Python trains to the TypeScript shipped

1:47:32

is now animated. We make some t-shirts, too. Yeah, there there's a there's an agent for that, I'm sure. Yeah. Or you just use a web service. But all

1:47:44

right. Uh, should we wrap this thing up? Yeah, let's do it. All right, everybody. Thank you for

1:47:50

watching AI Agents Hour. You got some updates on MRA. We showed a sneak peek of CLI templates. We learned some ML

1:47:57

from Professor Andy and we talked about reinforcement learning and a whole bunch of other things. So, if you, you know, watch that if you want to, if you're a builder like us and you want to learn a little bit of ML, we did a whole bunch of AI news. We talked about, you know, what the hell's going on with Windsurf. We talked about

1:48:14

Gro 4 and a bunch of other things. And then we just showed how we were uh able to build an 8-second video in V3 and how it's pretty cool. You can build some really interesting things really easily. And we have a whole bunch of tools that

1:48:29

we can use to build creative uh creative videos or whatever you want to do. Yeah. And some of the tools that we have access to. And that's it. It's a good show.

1:48:40

Hour and 50 minutes. That's a good mark. 225 people. Yeah. Thanks for being here, Admirals of

1:48:47

AI. Peace out. See you.