Back to all episodes

Parallel CLI Agents, Slash Commands, Opus 4.5, and more models

December 1, 2025

We talk with a special guest about running parallel CLI coding Agents. Abhi goes into depth on slash commands he is using as well as a general Opus 4.5 review. There are a ton of new models (including some new video models) that we will chat about!

Guests in this episode

Kiet Ho

Kiet Ho

Watch on

YouTube Spotify

Listen on

Episode Transcript

1:19

Yeah. Heat. Down.

2:51

doo doo doo doo. Hey, hey, hey. Heat. Hey, Heat.

4:11

Hello everyone and welcome to AI Agents Hour. I'm Shane. I'm here with Obby. What up? And we're coming at you on

4:19

Monday around noon Pacific time, mostly on time today, which is great. And yeah, we got a sweet show ahead of you. Ahead. How's it going, dude? Dude, was good. Fresh off the

4:31

Thanksgiving break. Feeling energized. What about you? Yeah, it was good. It's nice to see some

4:37

family, you know, get get to see some of the extended family I don't see too often. It's actually, you know, it's nice. Don't want to do it too much though, you know, like a couple times a year is good. But but yeah, it is nice.

4:52

Yeah, I'm still in the uh Manor with my brother and my family. Yeah, near the Austin area, right? Yeah, the Austin area. Dude, I actually had a Thanksgiving question for you. Do

5:05

you like Thanksgiving food? I feel like this is a loaded question. So, yes, I do. I don't I'm not a huge

5:13

fan of turkey. I would say I don't think turkey is like the best food, but you get some, you know, mashed potatoes and gravy, some some stuffing or dressing, whatever you want to call it. You know, you get some ham. I don't know. There you get all that stuff in there. You get some pumpkin pie. Love me some pumpkin

5:30

pie. I don't know. I I So, I I do I I'm a big Thanksgiving food fan. I usually

5:36

eat too much, but turkey that's hit and miss. I could I could go without it. What about you?

5:42

All about the ham, dude. That's where I'm at. About that honey ham. Yeah, I agree. If you're listening, this

5:49

is live. What do you think? What's your favorite Thanksgiving food? Is turkey overrated. I don't know. Let us know

5:54

if you celebrate, I guess. Yeah. Yeah. If you celebrate. Not everyone uh celebrates depending on

5:59

where you're at. in ethnic households like Thanksgiving is like some Thanksgiving style stuff but it's a lot of like you know for us like Indian food as well but like my parents don't eat meat or anything so we had Mexican food for Thanksgiving this year because we kind of just do that but like at Friendsgivings and stuff everyone does

6:19

like the whole thing like the sweet potato casserole and that stuff that makes me fall asleep after eating it. So it was a good it was good. It's It's the definition of food coma, dude. Definition. But I needed that, dude. I really needed a nice little break. It was good.

6:36

Yeah, I worked during the whole break, though. But I felt relaxed for some reason. So, Deb's here. What's up? What's up? Yo, Dev, I looked at your MCP wish list.

6:51

I'm considering it. Just letting you know. Just considering it. Nice.

6:56

Yeah. So Thanksgiving was good. You know, we got a good show today. We're going to be talking about slash commands

7:02

and Opus 4.5. We're going to be doing all the AI news. A lot of model

7:08

no like huge models. I mean, I guess maybe one's kind of big, but just a ton of models coming out. Not like the Opus 4.5 drop or anything, but definitely some quite a bit of models we'll be talking about. And then we're getting a

7:22

uh having a friend come on and talking about parallel CLI agents and what that means. So have a good show today. Shouldn't be too long today, but you know, we'll go an hour, hour and a half, something like that. If you are watching

7:35

this for the first time, please leave us a review if you like the show. If you want to give us a five star on Spotify or Apple Podcast, we appreciate that. Um if it's not going to be a five star, you know, don't bother. It's okay. We we can't can't win them all. And uh this is

7:52

live. Leave us comments as you go. If you have questions on YouTube, on X, on LinkedIn, wherever you're watching this, just, you know, say hello. Dev says, "Thanksgiving food is meh.

8:04

Nothing compared to Holly Dewali food." All right. Yeah, that's why.

8:12

Where should we begin on Opus? Yeah, maybe let's let's talk about Opus and then we'll we'll dig into some of the other stuff. I have I I'm gonna overhype some stuff because I Yeah, that this is new for you know, normally you're you don't get too overhyped. So, let's do it.

8:31

So, like um I think Opus is a uh Oh, thanks Dev. So, I'll close that then. Um dude, I really think OP Opus is a game changer. I really think it is a game changer as much as it felt when 35

8:49

came out like that same feeling. Um that same feeling I got when I was using 35 where I became instantly more productive but I was instantly more productive at doing like one thing or like one task. I just became better at doing and I think a lot of people on Twitter are also coming to this realization about Opus which is like it's really

9:15

good at it's really good at like just doing whatever you want it to do um especially with the right context and it also shows that a lot of our coding agent tools have gotten a lot better at context management and how Opus uses this context and then also you know produces really good code like it's really good. It thinks about the things

9:38

that you want it to think about. It goes deep. I'm using Opus thinking mode, too. I'm spending a lot of money, but like

9:46

it's so good and it makes you it makes your productivity more than yourself. And so then I'm thinking now like with with four with the sonnet in previous versions I've tried to multitask all the time mainly because I have ADD but like I want to I want to be multitasking all the time except I had to babysit too much when

10:12

doing multitasking with the previous models. Now I don't babysit at all because it does the define things in scope really well. I just have to check on it to go to the next stage of my development. And so now like all the

10:29

products that we're going to have a product we're going to talk about on on the stream today. Now the these products that were like talking about parallelism become even more capable because now you don't have to babysit. You can actually start many things at once and it's on you to pick up where it left off. Have

10:46

you tried it yet? Uh, so I've used Opus 45 just a little bit in clawed code and I will say mostly just oneshotted the things I need, but I feel like they weren't overly complex. I haven't been asking it. I haven't really put it through its paces yet, but pleasantly surprised. I haven't So, I've also

11:07

experimented with running multiple sessions of cloud code, you know, and with git work trees and all that or just different projects. I've always feel like I'm, you know, in the past it was so much babysitting that it wasn't, the juice was not worth the squeeze. So you telling me that maybe the juice is now worth the squeeze, maybe I will try, you

11:28

know, for me it's often like I kick off tasks and while I'm in meetings and doing other things because I don't write quite as much code as I used to. But I feel like I should just always have something running, you know, just and always like why not? Like if you have a few minutes, you can basically like move move a feature or something along and so

11:49

why not just always have it going? Exactly. And that's kind of where we're moving to.

11:54

So like the way I like evaluated opus and this is how I evaluate everything. I use a daily drive. Like I do multiple models for different things. I daily drive a couple of them. But now I'm opusing everything just so I could see

12:08

what it's like. Right? So like the categories of work that I'm doing. one existing codebase. There's a bug and

12:15

there's a reproduction, right? Boom. Opus, go figure that out for me. Thank you. And um cool. The second thing

12:24

is I have a I have a bug that is or I have like a a feature request or something that is changing the system. I don't it's that's not necessarily just go fix it, right? Let's let's plan together. Let's let's figure out what are the like solutions. Let's you know, I'm talking to myself here, right? when

12:41

I'm talking to it, but it's like, yo, help me come up with the right decision. Right? Then there's, let's do some crazy Let's go explore some stuff that's so out there and come back to me with like some stuff and I'm not going to act on it. I just want to read it and be like, "Wow, the future looks cool if we did that." Right? So, I have all these

12:59

modes and before like I was using composer for fixing bugs because it was really fast. But now, but then the the quality wasn't necessarily always there. But opus in all these categories, I'm getting freaking good code that I would roast. Not that much. You know what I

13:15

mean? And even code rabbit doesn't say much on it, which is also good. That's like all the little tests set up. So,

13:21

I'm overhyping it for sure, but you all should try it out. Yeah. So, we got some comments. Uh, Dev

13:28

says, "My takeaway from recent events is that we need better ways to talk about how models are different from each other." Totally. But we're probably not going to develop this new terminology anytime soon. I agree with that. There's so many

13:40

models, it's kind of hard to keep track of what's better at what. Yeah. So, that does seem like a problem. Uh

13:46

Daniel says, "It's been a marathon year in AI. It seems like new state-of-the-art models and tools are dropping weekly. What surprised you the most in 2025?"

13:57

This is one. Um, and then Nano Banana and all the music stuff like Sunno and stuff. Oh, dude. So, for sure. All the video models getting the the

14:10

thing is if you looked at it's just incremental improvements, right? If you really think about it, but every incremental improvement unlocks things that probably previously were not was not possible. And so that is what's exciting is you get you stack enough of these incremental improvements and it feels like you're in a different reality than you were 12 months ago. Even though like individually like yes this model is

14:35

better but it I think it just unlocks things you maybe didn't understand because it's not like it's so drastically better. It is obviously better but I think it just unlocks that next level of like autonomy that you didn't even really consider before. I bet we can make a compilation of all the that's happened this year from just our show. Yeah, we should. That is a that's a a

14:58

useful project of just like dropping a whole bunch of things. Uh Dev says Opus being so much cheaper is the top of my thing list of things I did not expect. That's a power move, too. Yeah. Uh Mole Frog says, "My experience,

15:10

Opus 4.5 is way better at oneshotting good UX and UI." So I think others I did bomber I did the bomb game one shot the bomber man test easy too easy for it you know but like working on mastra it's it's not just a one shot it's like a conversation so I'm not saying that like you're not you're having conversation with this thing cuz you're building you're building real things y'all like with your own code

15:36

base and it's helping you with your job I love it I do want to do a little segue into products that like I said there's a product coming on later But want to be fair to everybody. So I want to talk about the products and this is not a new idea these parallel stuff. So let me just share that real quick. First I want to talk about

15:55

I actually overprepared for this than I usually do. So that's pretty good. I want to talk about architecture of like a code like a parallel agent coding uh parallel coding agent setup.

16:08

You the basis of it is work trees. There are several libraries that can help you manage and view work trees and a lot of them are built into like different products. Each one of these work trees is a branch which has its own quote unquote agent. So it could be codecs or claw or whatever the hell you use, you use it and then you're giving tasks

16:29

between like in each one, right? So these are all kind of siloed. But the future of this looks crazy because you know if they're all in your same system and there's a bunch of context here you could share context between all these agents. So like and you know mra will be doing some stuff in this in this area but like you can share context between

16:46

these agents and you can have like shared memory or shared kind of skills etc that you know if one agent learns something in one task you can immediately learn that in some other things. oftentimes we work on issues that are related to each other in a way, right? But if you're working on them in parallel streams, it's not going to be good. But if you're working on them with

17:10

context of both issues at once and you're focusing on one, you can like make decisions based on the other. This happens so many times in my in my my life, so I'm sure it happens to y'all. Um, so that's just like basic architecture.

17:22

Let's talk about some products and then I'm done. So, and they're all YC, so that's cool. I gotta start with the homie first, Dex. And I gotta give Dex credit because like

17:34

he's been talking about this world for a long time. And maybe like we're at a point where you can actually pull it off and it's like a right time, right thing place for this product or anyone who likes it. So if you like this style, like human layer has a this exactly what we're talking about, but it's their swag

17:53

on how they do it. Should check them out. It's pretty sick. Um, next we have another YC company, Conductor. So, run a

18:01

team of coding agents on your Mac. It's like a desktop app and then you can kind of Yeah, I have played with Conductor early versions and it was pretty cool. But again, it was it was the problem of I didn't really trust it and I had to I felt like I would just get more if I just monitored one really closely. Yeah. And but maybe now it's getting

18:21

good enough where I can give it a little bit longer leash. So maybe it starts to become more interesting. Yep. And then lastly, we have where is this thing? Our guest for later, but

18:34

superset from Keat. He'll come and talk about this later. I'm not going to talk about it now. And then lastly, you can

18:41

build all this stuff yourself. You can use cursor or cloud code. Maybe not cloud code if you only want to use cloud models, but if you're just using opus, you can just do this in cloud code already with get work trees. You're not

18:53

missing out or anything by using these models or these projects. But if you want to use different models and stuff, these harnesses, everyone's talking about these agentic harnesses, you can use one of these products as your harness and then do stuff. I'm using I'm trying different stuff. I built my own too. And I built my own makeshift harness using cursor

19:15

IDE itself. It's super whack, but it works. Um, and so I'll probably migrate to one of these products if they're good. Yeah, I know like Tyler on our team has his own kind of harness that he's built,

19:27

right? So I think it is, you know, I I do wonder, you know, what's the what's the difference between these? Like where where do they win, right? So I'd love really excited to

19:39

talk to to Keat to figure out like what's the differentiation. Yeah. And I I know there you know it probably comes down to a context engineering problem, right? Like if you're a little bit better at context engineering, you can do maybe 10 20% more than some of

19:52

the other ones. But I am curious to hear that and and hear what the differentiation is. Maybe the differentiation is honestly just like better UX and DX. So literally could be

20:03

it's not that much different. It just feels more natural to how you want to work. Maybe that's the answer. I I don't

20:10

know. But I am really curious. Design is going to be huge, right? It has like you have to be the best design product one.

20:16

It's like the linear the linear for parallel agents, you know? It's like what's who's making the linear for parallel context engineering is going to be super important. All that you know. Um

20:27

and so and a lot of these are open source too. So like, you know, contribute to them and all that. So yeah, try out Opus though. It's legit. It is legit. I've never shield something like

20:39

this. If it's not Yeah. energy drinks, so you haven't shield it. Yeah.

20:46

Uh yeah. So that's that's a you know, strong statement. Let's uh let's use Opus more. And I will uh I'm gonna now now I'm inspired. I'm just going to send it on some tasks tonight and build some

20:58

things. Just let it just let it rip for a while and just see what it can do. Um, I did uh I I will say I've recently have had some good luck of just like oneshotting entire applications that are pretty good. So I built like this time tracking application for my brother. I

21:19

basically like it was like a twoot using in this case I was using replet but um I I could have used cloud code and I would have probably done you know or pretty good job too but it is pretty impressive that you know the he basically needed a time tracking for his employees and he had like a you know 25 people that he's going to have. He does snow removal. He's like, "I need somewhere to keep track of time." And he could have paid

21:45

like hundreds of dollars a month for like a few dollars per head. And I was like, "Dude, I'll just do this in like it'll take me like 15 minutes." And it literally did take me I in less than a half an hour of actual work. I deployed

21:58

a fully working with admin control time tracking application for him and his whole team. Deployed in production. Cost me a few dollars and it just works. So, it's like personal software, baby.

22:12

Yeah. It's like the the age of personal software, it does feel like it's more approachable than ever. And so, then I started thinking like, what else could what else could I do here? And so, like I I think there's just a lot of things

22:24

that if you think about software you pay for that you probably don't use to its full extent and maybe you pay a lot of money if you're a company, you probably could just roll some of that yourself. Like maybe not a lot of it yet, but I think some of it. And yeah, it'll get better the more that people use and

22:41

figure out what Opus 45 is capable of. I think also like having a mediocre version of the real thing is good enough, right? In when you're hustling, like if you don't want to buy Salesforce, for example, what if you just roll something? You just need it for a temporary thing. You're already using notion as a database, right? Why don't you just roll

22:59

something if that's what you want, right? So you have the ability now to solve business problems that you don't have to pay for. I guess you just pay replet or lovable or whoever. Yeah. But a lot of times it's it's one

23:11

of those things you can own the software then, right? I I would much rather if you give me the option. It's like do you want to pay every month for eternity and as you grow you're going to pay me more for the same or do you want to pay me once and then you have access to it and you can make it better if you want. There's like there are definitely times

23:31

where I would just rather pay the service, right, and not care about it. But there are also times where I would love to have the code, right? And I can craft it to fit our business needs and and things like that. Of course, I'm not going to rebuild Slack right now or

23:45

anything, right? Although you probably could. You definitely could. We should have

23:51

Opus. Let me just like let me just tell Opus to go build Slack real quick. One second. Just let it just let

23:57

I'm gonna ask I'm gonna ask it to do it. Don't ask me. Yeah. Don't ask me any questions. Just rebuild Slack. Just go.

24:03

see how far it goes. And then check burn so many tokens. Yeah. Then at the end of end of this uh

24:11

at the end of this live stream, check how much it cost us. Yeah. I'm going to mute myself because this keyboard I'm using is so loud. So,

24:17

but I am going to start it. You can Yeah, you you kick that thing off. If you are just joining, this is AI agents hour. Today, we just talked a little bit about Opus 45. We're going to talk a little bit about some slash commands.

24:29

Then we're going to talk about AI news. If you have comments, drop them in the chat. We like to talk to you as we do this thing. And we've been talking a little bit about parallel CLI agents or parallel coding agents and how people

24:42

are using them and we're going to bring on a guest later talk a little bit more about that. All right. Right, dude. I've I've uh sent it to do its duty. All right. Uh so tell me a little bit.

24:57

So, one of the things that some people here probably haven't seen, haven't noticed, is if you have looked at the Maestra GitHub repo recently, which I'm going to shamelessly drop here in the bottom of this video. And so, go give us a SAR if you haven't. We appreciate that. But if you have been paying attention, you'll notice the issue count has dropped tremendously

25:23

over the last two weeks. Maybe that's Opus 45 or maybe it's just Obby beasting on issues with the rest of the team. I don't know. The team's been cooking, but I I you have shared some things with the

25:36

team and with me around some of the slash commands that you've been using. So, I thought it'd be useful just to talk through how we at MRA are using slash commands, how the team's using them, and of course, if you're listening, would love to know what you're using. And do you have do you have uh your your playbooks set up?

25:55

Cool. Yeah, I'll get into that now. So, first of all, I don't I didn't realize that people some people don't know about slash commands in general. And so, let me just do a demo of what a slash command is.

26:08

Yeah. And and I'll be honest, I don't use them that much. So, I I need to start. So, teach me.

26:14

Yeah, for sure. Let me get set up here. Um, what should I show? I don't want to spoil anything for later in the show, so I probably shouldn't show that. Um, I'll

26:30

use this one. Okay. So, here we go.

26:42

I think it's this one. All right. You can see my code and this you can see this branch I'm in is a work tree and so we have a bunch of uh commands here. So what is a slash command? A

27:00

slash command is just a prompt uh that you save. It can have variables and input args right are the variables. Can you give me a can you give me a click of zoom? Yeah, dude. Let me do that.

27:12

Perfect. Cool. That's good. So, I have um we have a couple of these

27:17

that we're just experimenting with. I made this one that has been surprisingly amazing um at just and with Opus like and me at the driver's seat maybe because I built Mashra and I'm the one of the creators I was able to close a lot of issues. So, there might be some context that's in my head as well that's not actually translating to this workflow. But I've had some of our

27:43

customer or some of our other guys on the team try it out and it's been pretty good for them as well. So what is a slash command? You just write a prompt and this one's a pretty small one. So

27:54

it's not that big. You can put them in yourcloud or doc cursor or whatever. And for this one it's called make moves. That's like our kind of culture here at MRA. So we have a make moves command.

28:06

you can pass it an issue and that'll just that'll read the issue from GitHub and the first work step is to analyze. So we go it goes and reads the comments and issues. Now you might ask oh that's hard. What if the issue is written poorly? That's true. This is more effective when people are care about

28:31

issue writing and provide reproductions and stuff. Thankfully, our community is amazing. So, when we ask for reproductions, they're always there to give it to us. That can feed into what the research plan is because step two is

28:44

research where given what you know about all the that we just gave you, the threads, everything about the issue, go research the code base, about all the things that are relevant towards that um that issue. Now if you have a reproduction the great thing is the code is then grepable. So you can see where

29:05

like for example if it's like a workflow or an agent if the reproduction has that code it can then go find the source and opus is really good at finding where what package this belongs to. I thought it would not um but it does. Um hallucinations are still something to worry about though. So you have to be kind of semi in the driver's seat after this research phase. um if it's on

29:30

something ambiguous, right? Uh I've had to like I've done some deep research type issues where it just went off the chain or off the rails, sorry. And it's because I was working on something else. Um and I didn't pay attention, right? So for like really good reproductions, it's

29:49

so great. Then it like it just understands the codebase and then I have a stage where you prove it. And that's another thing that we've kind of started to adopt at at MRA, which is like fixing bugs is like kind of like a detective.

30:01

You have to like prove and it's like, you know, like a you have to prove it like is this really a bug? Let's make a failing test and have the reproduction actually fail in our tests. And so at this point once I let this go and I come back to a failing test usually if the issue is very well you know uh you know documented I will have a legit failing

30:25

test that I can then prompt for the next thing I want to do which is fix it or you know understand what plan it made etc. So that this is make moves. Then we have like you know if you want to do PRs and like for us we want to start writing our PRs descriptions and titles and commits all the same way. So slash commands can help that. So people don't

30:45

have to have different styles. And uh yeah the way I think about it is like these are just hotkeys for doing stuff like in like World of Warcraft. Yeah. I was I was gonna say it's basically just like you know Yeah. just hotkeys. Um, do I noticed like so you

31:04

had Claude and is there also cursor as well? Do you have do we have cursor one where we I don't know why we copied this. They're all the same in here between the two. I think they're maybe

31:15

linked together. But yeah, you can use these in Claude. So if I'm in cursor, where's my cursor? I could do like slash. You open up the slash, which is

31:26

Where is the slash on this keyboard, dude? Right here. Nope. Nope. Right

31:32

here. Okay, there it is. Um, sorry. Um, so you have like all these commands that

31:38

you can just use. You share it with your team so everyone's coding the same way. Like I have this one for making a PR.

31:45

Like this will create a change set. Understand if the change was major, minor, patch, open the GitHub with a humble PR description. You got to put the humble in there because if not, it freaking be like makes the PR description like you're like an you know? Like if you read it, you'd be like, "This guy is not a nice person who

32:04

opened this PR." Um, so you want that to be pretty concise and everything. So, this has worked out really well. I I like the without flowery flowery or overly verbose language.

32:15

Yeah, because it'll be it'll write if you let it, you know. Okay. Okay. We got a couple comments that we need to talk about. So,

32:21

Sebastian, what's up, Sebastian? I talked to you earlier today. Uh, I always have a picture of slash in my mind when I hear about slash commands. Yeah, slash the guitarist. I can dig that. I think, you know, uh, but Dev

32:34

says, "This is the second time I've heard someone relate AI coding to a real-time strategy game. The next IDE will just be Starcraft." I was literally before you said that, Dev, I thought that same thing. It's like, it kind of

32:48

feels like it, right? It's like the best Starcraft players. because I played a lot of Starcraft the original were the ones that could manage a lot of things going on at once, right? There's like tons of context and you got to like what

32:59

what you know what are these uh these troops doing over here? What you know are we mining? Are we getting gas? Like

33:05

all these different things that you have to do simultaneously and you got to keep all that in your head and be able to bounce back and forth really quickly. The best I wonder will the best developers and engineers just be the best real-time strategy players? If you're good at RTS, you're probably good at multi- parallel agents. Like I would

33:23

I would not put like if you're one of those like League of Legends dudes or girls, you're probably kill it at this because it's all about like making sure all your like Well, right now there's no I'll get into one point. There's no metrics about what's going on across all the agents, right? You know, like in games we have like the health

33:39

bar is going down here and there's like a magic points and all this shit's going on. you need telemetry to manage all these people which we don't have which I'm sure will happen soon but then once you have all that then you can start doing stuff so I think it is very much RTS yeah it's so dev says I watch a lot of competitive age of empires too a lot of multitasking context switching fire and

34:04

forget it's the same it's kind of the same stuff I mean the only difference is at the end of the day you're you know at least as of today you're still responsible for the code that you're committing, right? So, you still have to spend the time to understand it, at least at some level. Maybe at some point it's so good, you don't have to. I I

34:22

think we're still a little ways off that. There's still hallucinations. It's still not quite good enough to not look at look at things. But I do think that there's probably something to having,

34:32

you know, some kind of real-time strategy dashboard of like, hey, this thing's been going for this long. It's now stopped. And you can kind of like click hotkey between different, you know, different things. Like that's going to be the next interface for sure.

34:44

Yeah. A bunch of macros and stuff. You just like hotkey command L which is like maps to the PR commit then you just like just do stuff like that. Yeah. I mean it definitely is like a a

34:56

weird gamification uh of like just writing software which is kind of interesting. I thought this one was funny. Zerg Rush on 1.0.

35:09

Yeah, that's essentially what we're doing. Yeah. All right. Well, dude, should we get

35:15

into the news? Let's do it. All right. So, we're going to talk about

35:22

a bunch of new models that have come out. Some of them, you know, probably bigger and more important than others, but we're going to talk about a lot of them. So, first of all, last week, I think it was last week, saw this message. I hadn't seen this model come out, so I don't actually know when it

35:40

was released, but DeepSeek Math V2. So, let's just share this post from Simon Wilson says, "Deepsek math v2 means we now have an open weights Apache 2 model that can achieve gold medal performance on this year's International Mathematical Olympiad. Previously, proprietary models from OpenAI and Google DeepMind had achieved that. So

36:06

it's on hugging face. So basically you have a new deepseek model that is really really good at math. So you know again open weights is that's good means we can use it. It's good for the open source uh models to have a community to have a lot

36:24

of good models. But Deepseek wasn't done because today I saw this. So Deepseek v3.2 2 and deepseeek v3.2 special I don't know

36:39

special special uh reasoning so first model built for agents which that's very interesting so the 3.2 is the success the successor to 3.2 to exp or experimental. So now it's live and special is the special is

37:01

pushing the boundaries of reasoning capabilities. So if you look at uh kind of of course you got to look at the benchmarks we have a comparison for like GPT5 high Gemini 3 Pro Kimmy K2 thinking and then you have the new deepseek ones and you can see that like the aime 2025 it's competitive but the special actually outperforms in pretty much all of them if you look

37:29

so I don't know what they're doing I don't know if they you know benchmark maxed that that one or or what? But it is at least on the benchmarks it is killing it. They benchmark it.

37:41

I don't know. Yeah. Like you know like maybe they just train just for you know if I want to train one part of my body to hit a hit a new PR. I'll just train that one thing. But it's I mean I mean you can't post these

37:54

scores without like turning some eyebrows. So this is like very very impressive. Especially again um it's coming from deepseek. I'm actually I'm just making the assumption that it's open weights. Like I actually don't know

38:08

that. We should probably It looks It's on Hugging Face. So yeah, I would imagine it is. Most of them are, but I

38:16

could be wrong. We should probably look at that. But it is uh very interesting and a pretty uh pretty big if true, so they say. Yeah. Uh so

38:30

this one is thinking sorry one thing about math you know like some people are good at math and like other people are good at English and stuff I mean it's kind of to Dev's point earlier like how do you like determine models between each other do you think like some like for like mathematics or something people like you

38:50

know like the benchmarks you use like how good is this model at math but as a general model just like you would like measure someone's intelligence by like you know their education of math, science, whatever. Do you think that could be like the way you determine the difference? Uh I mean I don't know maybe I mean I mean do you think about that with people

39:15

that like you dude's good at math so he's probably like smart you know. Yeah. Yeah. I mean, I I do, but I also think there's like different forms of in there are definitely different forms of

39:26

intelligence, right? Like there's also a lot of lot of people that are like, you know, dang, that dude is smart, but maybe not has the best common sense, you know? It's like they make a lot of bad decisions. So, it's like just cuz you're smart at one thing or if they're psychos.

39:40

Yeah. Just because you're start smart at one thing does not generally translate. I mean, in general it does, I think, but um I I think there's like the It's not always true. So I think the Gemma model that was a

39:55

psychotic one was probably good at math too. Probably great probably great at math. Just, you know, a little a little over analytical and took things too seriously maybe. I don't know. Uh so we got another

40:10

this one came out. I think I saw this last towards the end of last week. You know, at Mafra, we always kind of keep up with what's going on in agent memory. You probably all care about this as

40:23

well. So, Super Memory came out last week and said it's now the best in the world at agent memory. So, state-of-the-art results on Longme Eval significantly higher than everyone in the industry. So you can see their

40:35

benchmarks here uh where they compare, you know, Zep and Super Memory on, you know, a couple different pieces of the long me eval catching strays, dude. Yeah, they're always catching strays. Yeah, you know, I'm glad they didn't put us on this list. Yeah, I mean, we'd be we'd be right in

40:58

that, you know, towards that middle bar, though. We'd be right in there. So, um, you know, pretty well for now. You know, for now, we we will see. You know, check back

41:09

check back soon. That's all I'll say. Uh, but cool to see. I I think we've

41:15

mentioned this here many times. I really do think that agent memory is going to be more and more of a focus. You know, if you have these really good models that do well with the context you give it, how do you get the right information in the context? How do you have really long conversations where you can build agents that never forget, right?

41:34

Yeah. Hasn't been completely figured out, but it's definitely way better now than it was earlier this year. And I think next year is going to be we're going to see a lot more innovation in this front.

41:45

You can use super memory with MRA by the way. So if you want to use it, yeah, give it a try. So this one came came up as well, I saw.

41:57

And I can be honest, I didn't do too much research on it, but Prime Intellect basically has released a model called Intellect 3. They basically scaled reinforcement learning to a mixtures of expert model, but it's based on GLM 4.5 airbase. So,

42:16

it's kind of like a, you know, they did some supervised fine-tuning and some RL on top of GLM 4.5. I don't know how good it is. I haven't really looked at it honestly. I I don't care that much because I'm probably not

42:30

going to use it. But it got a lot of heat, both positive and bad and negative. Like some say it was way overhyped, but I also heard um you can see it scores relatively well, better than you know GLM 4.5 on its own. So it basically shows that you can fine-tune

42:49

and add reinforcement learning on top of these other models and maybe get some interesting results. I think just the pattern of this becoming more common is going to happen though. So that's why I thought it was kind of interesting to talk about it. The idea that now we have

43:02

some of these other openweight models that you can use and you can build on top of. We'll probably see some significant new models come out that are very good at certain things just based on what you know what you need. And we kind of talked about like this a little bit with next year's probably going to be a lot

43:21

of small language models that are going to be just fine-tuned. This one's not really small by any means, but it is it is very interesting. Any comments on that? I've heard about this prime intellect group um from our

43:36

friend Abina um from Lucid. Uh I think he's like he knows those dudes or something. But yeah, pretty cool. Just people are just moving.

43:49

Another small, this one's kind of a small model update. So Nvidia dropped something called Orchestrator 8B. So it performed on humanity's last exam benchmark was a 37.1 which is better

44:08

than GPT 5, but it's also two and a half times more efficient. So it's just a small, you know, relatively small model dropped by Nvidia available on Hugging Face. I don't think they really made a lot of fanfare about it, but you know, someone noticed it. So people just be dropping models and not even telling people nowadays.

44:28

All right. The uh HLE just for everyone's context. HLE is like academic benchmark.

44:35

So yeah. Yeah. Yeah. There's a some comments in

44:41

the chat. Dev asks, "Are you guys working on better memory?" Maybe. Yeah. I mean, we're always improving.

44:48

We got we got some we got some cool stuff coming. Uh Juan says, "Been following me zero. They're the homies." Yeah, me zero is also in V1. You'll be able to use Mezzero way cleaner in MRA, so should use it.

45:05

Yeah. All right. So, we did I did see a bunch of new and actually just happened today. New video model updates. So, the cool

45:18

thing about video models is we actually get to watch the videos. So, let's do that. So, Whisper Thunder, it was kind of the code name. It was from Runway ML. And I actually kind of forgot about Runway if

45:31

I'm going to be honest. Yeah. But now they're they are back on the map. So, let me uh let me share this.

45:41

So they it's also known as Gen 4.5 and it's a new video model and in a lot of ways I think it's pretty state-of-the-art from what That car is tight. SL and obviously there's some editing putting all these clips together, right? But the stuff that you can do

46:47

Yeah. The stuff you can do with some of these video tools is just it's getting pretty insane. any any any movie in the '9s. This could be any movie. Yeah, it honestly could.

47:01

Fast and Furious right here. Star Wars, Terminator, Game of Thrones. So, this says, "Today, we're excited to share our new Frontier model, Gen 4 5 was built by a team that fits under two school buses and decided to take on the largest companies in the world. We are David and we've brought one hell of a

47:50

slingshot." So, dude, that's baller. Yeah. So, it's kind of cool, you know, like it makes you it makes you want to

47:57

root for him a little bit, you It's like they're competing against obviously huge way more funded teams, way bigger teams and they released something, you know, pretty cool. It kind of feels like, you know, back when it first when image models came out, it was like midjourney, right? It was like mid Journey was smaller, only worked on Discord. You kind of wanted to root for it a little

48:14

bit. This feels kind of similar and pretty cool. Like I've used Runway in the past. I thought it was I was building some like video stuff. It was pretty good at the time, but I imagine,

48:25

you know, now it's much better. I'd be interested to see how this compares to, you know, Sora and V3 and and all that, but they did release some uh kind of some benchmarks on and I'm not sure what this ELO score is. I don't really know about this benchmark. We're going to you should look that up. What

48:47

is what's EO score benchmark? What is that? We should figure out what that means. But they show the chart that they

48:52

always need to show which is on this benchmark. Runway is significantly higher than you know V3, VO31, Clling 2.5, Soro 2 Pro and yeah was just sounds like it was just released today. So, so ELO's are a benchmark that was originally developed for chess to

49:14

dynamically track relative performance and create leaderboards. Pretty cool. Okay. It's a measure of skill by comparing different models against each other.

49:27

Like a LLM arena and all that type of stuff ELO ratings. Cool. So, it's and so the model you might have heard it was previously known as Whisper Thunder aka David. They're going they're

49:38

really leaning into the David verse Goliath theme. State-of-the-art sets a new standard for video generation, motion quality, prompt adherence, and visual fidelity. So cool. If you're interested in video models, check out the new runway model. But that's not all. It's not the only

49:56

video model that uh that came out today or at least has come out. I think it was I think it actually released today. I wonder if they like uh you know timed it. You know, I don't know which one launched first, but I wonder if Clling saw that you know runway launched

50:15

or vice versa. They're like, "Oh, we got to launch, too." I like Cling. So, there is a new Cling model and

50:24

Cling 01 is here. They've been tested the last few days. It's a beast. can use multiple reference images and get 360

50:31

consistency control movement using a reference video. Edit video from a prompt. It's available now. So, here's

50:38

here's the video that this person put together. Meet Clingo1, the unified multimodal model that lets you create and edit video with a single prompt. It lets you do anything. It's almost like magic.

50:52

Combine a reference video with a reference image to change a character. It even works with multiple characters. Provide reference images from multiple angles to achieve 360° consistency and edit existing videos, change characters, backgrounds, or remove objects.

51:10

So, skip the tavern tonight. There's a new video model in town. That's legit. Yeah. So, that's pretty cool. I I mean, that character consistency is

51:22

always going to be one of the hardest pieces to solve, right? If you're going to build any kind of real storytelling tool with video, you need character consistency. This obviously is a huge step in that direction. Um, let's There's the actual announcement from Clling. We'll share

51:40

that as well. Cool thing about Cling is you can put two clips together and it kind of like figures out how they transition, which is cool. That's how I did that Giblly video back in the day. Yeah, I've used Cling as well for some just like fun fun videos.

51:59

It also can make some weird stuff happen too. All right, so Cling Omni launch week day one. So they're apparently doing a launch week. So they must be uh launching a bunch of things this week.

52:11

But they introduced Cling's Cling 01 brand new creative engine for endless possibilities. So multimodal understanding you can see you can get some free credits whatever. Let's watch the video.

52:45

That's tight. Whoa. Heat.

53:39

legit. Yeah. So, I think the cool thing about Cling is it's it feels like a much you have much more control. It's like a video editor, right? Yeah. Where you know, if you've used VO3, I

53:52

mean, they they have some tools, you know, or Soro, they have some tools to do video editing, but it feels more of like a like a generator, like you're generating just like videos that you then have to pull in somewhere else if you're going to edit, where Cling feels much more like a power tool. So, that's my my experience at least. Yeah, dude. I

54:10

wish we could meet like a AI filmmaker um and like see what the process is because it has to be super detailed like clip to clip and everything. I wonder what tools they use. Yeah. Well, I mean I think there's probably I I bet you it's a combination of just just like writing code, right? There's

54:30

like the new tools, but then you they probably have their, you know, their old tools that they go back in to. Like I have a video editor, right? like I know I know what I use, what I'm familiar with. So, I'll pull a lot of clips into

54:42

there and and do it. And obviously, I imagine the experts have their their tools as well that they are very comfortable with, but they can generate the clips and other tools, do some things, and then pull it into what they're familiar with. I did like Runway's video better.

54:55

Yeah. And like the quality too looked a lot cooler, but it's tough to know if it's real or not is I guess we have to play with it. Yeah. Yeah. I think that's the thing is like it's probably, you know, not every

55:07

shot is is gonna is going to get you there. It's going to take multiple generations, I imagine. Uh, Amjad says, "Hello." Hey, Amjud. Juan says, "Wow." Cling 01. Clap.

55:20

Dev says, "Honestly, all I wanted was a better background remover." Well, hopefully by now you have it, but you know, you should hopefully Cling can do that for you. Um, and then yeah, Sebastian says they should also release a making of all these trailers. I agree because it's it's not like they just generated it took a long time to

55:37

generate a two-minute trailer, but if you think about the amount of money and time to generate any one to two minute trailer, it's expensive. So now you rather than uh now you're just paying for token costs or credits. All right. So, one other thing that I saw kind of come

55:58

across in the this last couple weeks and it's kind of gotten quite a few stars on GitHub. So, I thought it'd be, you know, interesting to share and talk about is this thing called better agents, which is standards for building agents better. been seeing this all over X probably because I interact with a lot

56:18

of uh agent related things in the dev tool space but it's a CLI and set of standards for agent building it supercharges your coding assistant like cloud code cursor making it an expert in any agent framework you choose agno mastra etc and all their best practices it's the best way to start any new agent project I haven't tested it I can't validate it but you know

56:43

project Yeah, they mentioned Mastra, so I appreciate that. Uh, those are the homies from Langwatch. Yes. So, you can see it's under it's under Langwatch. Uh, so it basically tries to

56:58

generate best practices and or use best practices so it can basically get started quickly and make it an expert on writing code. I'd be curious on how well it does compared to like our MCP doc server, you know, and do they work well together? Do you just pick one? You know, those are the questions I would

57:18

have. But if this is something you're interested in, go give it a star on GitHub. While you're there, give the Maestra project a star, too. Just context engineering, dude. In

57:30

another forum right there. Yeah. So, keep coming back to the same patterns.

57:37

All right, dude. Well, we uh bring our guest on. Yeah, let's do it. Before we do, if you

57:42

are just joining us, thanks for tuning in. We do this thing live on Mondays, usually around noon Pacific time. Please go give us a review on Spotify, Apple Podcasts, wherever you like to listen to this. If you're not listening live,

57:56

thanks. And each week, we bring on guests. We talk about cool things like this next guest, this next topic. We we kind of you you kind of teed it up a little bit earlier when we were

58:07

talking about parallel agents and now we're gonna go a little bit more in depth. So, let's do it. Yeah. So, let's welcome our guest, Keith. What's up, dude? What up, gang?

58:19

How you doing? Good to see you. Good to see you, too. I am uh calling

58:25

out of the the freestyle hacker house. Nice. and I uh I guess maybe before we get started, you want to introduce yourself and then kind of tee it up with what you're working on and we can dive right in. For sure. So, my name is K. I am uh

58:46

co-founding something new called Supererset that uh I've been uh you know obvious been around for kind of the inception of it a while now but uh we just launched it semiofficially this morning and I've been getting some good feedback on it but it's a way to run multiple agents all in parallel. It's a terminal that lets you kind of create work trees and kind of group your agents

59:10

in there so they can all work in like a totally parallel um clone of your repo. And the goal is to let you do this for like 10 agents at once. Like the go run a 100 agents in a day active development and uh yeah just getting started. Very

59:28

excited. I've been using Superset basically as my daily driver for the past couple weeks. I'm pretty bullish, but I'm kind of biased. Yeah. So, if it's superset.sh,

59:41

right? That's the website. Yes, sir. All right. So, we're just looking at the website now. The terminal app for

59:47

parallel CLI agents. So, you got a got a video there for anyone who wants to see it in action. Can you can you give us a demo? Can we see a demo? You can probably see that

1:00:00

demo if you want to mute it and I can uh Yeah, let's do that. Through it. Hi there. I'm Satia.

1:00:07

So that's that's Satcha, one of the co-founders um talking about it, but he'll showcase it in a second. So think you can pause there. It is um it is a terminal management tool. So, we're building this on Electron with no PTY and Xterm. And uh Abby was telling

1:00:29

me how that's kind of a kind of an ass stack, but that's it is the way it is right now. Uh all in Typescript, big the whole team is big TypeScript developers. So, you'll see there there's a top bar um that each top bar represents a a work tree that you create. So like he'll create a work tree in a second here but I'll just give the

1:00:53

rest of the tour which is when you create a work tree everything you do in there is grouped on the like left hand side tabs there and each tab is a terminal and we'll have other types of tabs in there as well but each of them represent an agent. So my flow I like to have codeex and cloud code like running

1:01:12

parallel. I might have my my like lazy git or like a dev server running. So it's everything terminal uh for you to run in a work tree.

1:01:24

What's the inspiration behind the idea? Is it something you saw on the market or how'd you get here? So I I noticed over the past couple months my kind of I'm I'm an avid programmer, but my programming in the the core day-to-day of my job has really changed from writing a lot of code to kind of like queuing up a lot of agents and

1:01:48

orchestrating them and moving their stuff around. And so um it I started to run into the the kind of like the wall that we all run into of what do you do when your agent is running? Like do you like go on TikTok, go take a walk, make some food or do you start another task?

1:02:06

And so I started starting other tasks or playing around with work trees. There was just so much context switching and so much to manage organizationally that there was just no good tooling for it. And so we're building that to solve that problem.

1:02:25

That's sick, dude. There's a lot of competing I guess like there's a lot of competing products now or we just actually went through a couple earlier in the show. Like what's I guess like one, why should people start why should people start using even if they're not using your product, why should people start doing things in parallel?

1:02:48

Um I think you used to not be able to do things in parallel is the big one, right? Like it we are now very recently at a point where the agents are semi-autonomous and can do stuff and your job is more sitting at the high level and orchestrating these guys. And so if you're just doing that single threaded, you're spending like 90% of

1:03:09

your time just waiting for things to happen. And so uh I think if you want to be really productive now uh you kind of got to operate like a VP of engineering or like at a higher and higher level. And so I just think this is the next phase of coding. It's just like you have a bunch of junior engineers you kind of

1:03:32

orchestrate them. You you kind of have to executive manage is my take. Are there any like uh tips that you have for people who are using multiple coding agents? Is there any like ways of the trade? Mhm. I I mean I would say you should use

1:03:53

superset, but I also would say that you you should experiment with running work trees and try other tools that do work trees because it's a good mechanism. Um, and like UX for terminal management is the key. Like there's a lot of automation that you would want to set up to creating a work tree, tearing them down, spinning up like parallel agents.

1:04:15

We're also going to build a way for like you get notified when things are done and you get like a queue of stuff to look at in Superset. Once your agents are done running, you should have some kind of mechanism like that like set up hooks for your agents um and just like have them kind of report to you and like

1:04:33

basically make it so that you don't go crazy switching in between all the task and it's just a lot of organizing. Yeah. Do you do you notice that there's like a certain amount that you personally can run at one time where you before your you know mental ability starts to break down? Because I think that's the biggest challenge for me is

1:04:52

like how many can I run? I suppose it depends on how good the models are, how much you can trust them, how much lease you're willing to give them. But um like Obby, I know you kind of mentioned casually like two to three is like your sweet spot of where you feel like you can run two to three agents at a time and still keep tabs on what's going on.

1:05:09

I know. Um, others on the team have said they they've tried to do up to five, right? But maybe it broke down a little bit when they got too many. Is there any

1:05:17

like some that you you've noticed yourself, Keith, or others that have been playing around with it have noticed like, okay, when it gets beyond this, it's maybe less efficient. Like there's probably a sweet spot of efficiency, right? Yeah. I I for me personally, I can run

1:05:35

like three on a like a I'm chilling like five if I'm like overclocking. Um I I do know someone who um who is like legit cracked uh is like MIT guy like CTO also was in YC that is running 10 at the same time and he like I I think he he has a lot of automation setup to do that and he actually has a very similar setup to what we built at superset but in his own

1:06:05

like terminal management tool like kind of team style like setup. Um, and it's it's kind of a tooling solve in my opinion. I think you really can run 10 at once and I think you can run like over a hundred a day. That's kind of that's the goal of Super. I think you can run a million if you

1:06:24

want. I think you can run 10 million. Yeah, cuz not all this compute has to happen on your computer. Maybe it happens to happen today, but not in the future. Like I was, this is actually

1:06:36

interesting story for you, Ke. Um, I was at So, I hung out with a homie at Dog Patch Saloon, which for all y'all who know, we hung out at the Dog Patch Saloon. Monster people hang out there, or we used to because we moved, but I ended up back there somehow. And I was talking to some people who are engineers

1:06:55

and they have two laptops. Is their concurrency problem? And I just thought that was like, why are you doing that? like why don't you just have like a node in GCP and just you know SSH there and

1:07:08

like just do stuff but like for them to the infrastructure problem is actually not in their wheelhouse right they're not that type of engineer or something so they just have bought another MacBook which I think is the wrong solution right yeah there might be some physical scaling limits to that yeah we got some good questions here or so this is like warp we'll talk about that

1:07:33

in a second. Um, and then here's a good question from Sebastian. Is your vision more focused on running the agents in parallel live or on focusing 100 or something in parallel and managing them in groups and sess sessions? Yeah, it's it's kind of an interesting it kind of scales I imagine it scaling

1:07:52

like an org where you just get higher and higher level. Obviously like you yourself running 10 plus threats is almost impossible. I do see right now my workflow is like I have codeex or I have cloud code like codeex is doing review I'll like copy and paste it into cloud code hey implement this and then bring

1:08:11

it over to like cursor compose and tell to refactor so do all these things that I feel like it's just like you want to start putting an agent on top of that uh using like a something like mastra and have that agent now you know manage your workflow and you that kind of scales down or scale up is my opinion. But

1:08:34

again, it's really early. We're trying to solve like the the five to 10 first, but that's what I'm imagining it going is adding more orchestration on top. Yeah, because right now like a coding agent is like per branch one agent, let's say, right? You can run codecs and stuff, but the context window is not the same across all these tools. So if

1:08:56

you're in the same branch and you're doing your review agent and then you have to copy and paste to the executor, then you have to copy and paste that to the next dude, that's whack, you know? So there has to be some shared context within superset or any of these products that share the context between all the the coding agents. Then you can have little swarms and little

1:09:15

like pods, right? You can have little like pods of this and then your slash commands can be like actually you're going to hit cloud code and you're going to do this and then after you're going to go take that put it into into opus and then make moves, right? Are you planning on context engineering your

1:09:32

Absolutely. Yeah. Yeah. I think it's necessary. Um yeah uh it's it's a very

1:09:38

much a tooling problem in the context problem where you want to have something that decides like right now I'm manually kind of managing the context and feeding them between each other but I I don't see why that can't be automated. It is kind of low compute to figure out what to throw one way or another or low reasoning to do I think.

1:09:58

Yeah. or you have like a bunch of just a bunch of conversations that or like subthreads that every agent has access to and things like that. We'll take a look at it together uh when we uh do some context stuff. Yeah. At the Dog Patch Saloon

1:10:14

next week. Exactly. Dog patch saloon. Yeah.

1:10:20

I'll be I'll be SF next week. So, you know, meet us there. pretty let's talk about the warp thing real quick though and Shane had this question earlier Shane if you want to ask like the differentiator question actually I'll let you ask the question yeah so you know there's a lot of really interesting products coming out in this and actually we had this whole conversation of it's kind of feeling like if you were good at real-time

1:10:44

strategy games you might be good at uh managing a bunch of agents because if you're good at Starcraft or whatever and you're managing all these resources in these different places managing agents kind of feels like that. You're like bouncing back and forth. You got to keep everything in your head, keep track of where they're at with these different tasks. And if you can do that really

1:11:02

well, then you you basically like have the right level of micromanagement when needed, but you can give them a leash and I feel like you can be very productive. But where do you see like with all these different products that are out there? Like how do you how do you think someone wins in this space?

1:11:19

Like what what are the key differentiation? Is it the UX? Is it the Is it the context? Is it a mixture of

1:11:24

both? Like what what do you think really is going to set Super Set apart, but also like who the other ones like what what are the things that going to help stand out for people that are kind of building in this space? I think um it's there's not a good answer to to this mode question honestly like even in tooling I I think that as far as something like warp or even superset like a terminal tool there's

1:11:50

pretty low switching cost it is very much like a UX solve uh right now for us my take personally is just you you just build like a the most crack team and you try to like out execute And if you you don't then I guess you lose. Um so that that's what I spend my cycles on at least just like getting the a really good team together. Um bit of a

1:12:15

non-answer but I think the other side of it that the mode is that these workflows are you are generating some preference and your settings for it. as far as like if we go further down into the orchestration route, uh you'll start building kind of like how you do your workflow. And I think we're we're close enough with a lot of our users where

1:12:43

we're figuring this out with them and we're building that into the tooling. And so at at least in the short term, that's how we think we we can win is just get a lot of engineers that we know are great and have this like exact need that we're we're solving and Warp is not really paying attention to and just build the tooling that doesn't exist for

1:13:04

that. Yeah. Um, so a couple questions in the chat.

1:13:10

One asks, "Does superset have native built-in tooling for notifying when the agents are done and require your attention?" Yes. Uh, one interesting thing we're doing is, uh, kind of overloading when you call cloud code or when you call codeex and automatically setting up the agent hooks for you. So,

1:13:29

one of the hooks is is to notify you when it's done or it needs your attention. Some of the other hooks are like well each step we can plug in an agent and kind of like summarize and give you a higher level view than what the agent would produce and save view tokens there. Uh there are other stuff like the start hook to kind of like name

1:13:48

the branch and have it like really it makes it so that you you can context switch more easily. So yeah, the answer is yes. Like we're working with the native hooks for these agents and stay on top of that. You got to support cursor agent dude. Cursor agent doesn't support itself,

1:14:06

man. Build more hooks into their CLI. We also have like a catch all to like check the threats and like if it's idle, we'll give you a ping as well and have you set that up.

1:14:19

Yeah, they don't want you to exist, you know. Yeah, they they want to build their own. Yeah. So, they don't want to play nice. So, one also asks, which is a little bit different question, one thing that is

1:14:31

becoming expected is a start and review a progress or a task on your phone. What's your opinion of this and will this be available in superset? Yeah, there's a branch out there or a work tree out there to do that in superset. I I think my thing that I'm

1:14:50

fighting a little bit with my co-founders on is I don't I don't think you should turn your brain off when you're developing. And I feel like when you're doing stuff on your phone and you're just like that seems nice, like you're not looking at the code, it's kind of a brainoff moment. And I I think you still have to think really hard. Um

1:15:11

but yeah, you should be able to kind of ping it and like maybe get summary. I just don't want us to enable like bad engineering practice to happen. That's my resistance uh against phones. But yeah, like you can set up your like

1:15:25

cloud environment and clone work trees. Uh we don't have all of that built out, but we're definitely playing around with it and are aware. I just I I I just worry about you actually producing something useful when you're coding on your phone. Shane and I have a benchmark for coding agents. It's called bar test.

1:15:44

Um and if we can't prompt at the bar, then it's it's not that good, dude. All right. I I So my here's my take. I think there's like I I agree with you that if

1:15:56

you want good results, especially today, like you got to you got to micromanage that agent, you know, like you can't really trust it even with like four or five. Like I don't believe that you can really like completely let it loose. Like you got to monitor it. You need to be it should be like a collaborative experience, right? You're not just like handing off a task to a junior engineer.

1:16:15

You're you're actually like working with them to like accomplish it. And you can move faster because you're working with multi them multiple of them at once, right? Um, but I do think there's a subset of tasks that really are fire and forget. It's just like

1:16:28

just bring it back. I don't I don't care about the timing of it. Like I just it's it's in my head. It's like there are

1:16:34

many things where and I can give you an example. Like last week I wanted to add some like product instrumentation for analytics to our website. And sometimes I would just do that myself. I just write the code. Sometimes I would just like open up a linear issue and someone on the team

1:16:51

might pick it up or maybe I'll pick it up later, but I just, you know, now I just go to cloud code and just like fire it and I'll come back later and I'll like turn it into a PR. It'd be really great though if they just did the whole process because it's a pretty easy thing to do, right? Like I I trust like I'll review the code, but it's like 20 lines

1:17:08

of code. Just do it right. I'll review it. I'll know if it's good or not. I mean, I'm be honest, I didn't even

1:17:13

test it. I looked at it. I was like, "That's good. I just shipped it. Website didn't break because I knew because I

1:17:19

knew what it was. Like I knew how to do it. I didn't know exactly where in the files like to do it. I didn't know how we did all the patterns, but I knew what

1:17:25

good looked like. So I could just like fire it, go to a meeting, you know, fire and forget, come back later, ship the thing, and it all worked. So I think there's like a different class. Like if it's a hard problem you absolutely have to be

1:17:37

engaged. It's like an easy thing you would know how to do, but you just don't want to spend time on. Maybe there's a like you could do that from your phone while you're at a bar or, you know, while you're uh while you're doing something else.

1:17:48

Yeah. Your kids soccer game, you know, you just want to check in. Yeah. So, so we got a couple other comments questions for you. Yeah. A lot of very active chat today, which is great. So, Dev says, "I very

1:17:59

badly want cursor to support sub agents." Juan says, "Uh, sorry for so many questions. I've been thinking about building something like Supererset that would talk to an orchestrator on my laptop that I can interact with agents with from my phone to start tasks whenever I have an idea. So, it's kind of like my what I was just saying one

1:18:17

like you think of something, you don't want to write it down for later. Just like send it off and get started on it and then you might pick it up yourself later. Yeah. No worries about the questions. I mean, they're they've been really great.

1:18:29

one. Um, we're open source, so feel free to fork it, PR it, uh, whatever you'd like or build it yourself and add it. Um, I I've heard good things about Omnara. I am worried about the again the

1:18:43

phone thing. Um, but yeah, but I I can be open to it. Obviously, we playing around with it and it makes sense. Um, I do think you it gives you the tool to

1:18:54

shoot yourself in the foot. You should refrain from doing so. And then some people some users wanted to shoot themselves though. And that's okay. That's why we sell guns. That's why we Yeah,

1:19:08

that was a bad joke. I should have made that. That was a bad joke. Hey, this is live. We can't take that joke out.

1:19:14

The USA. Cut it. Cut it. Cut the joke. Uh, yeah. So, anyways.

1:19:20

Yeah. Yeah, I mean I think it's a really it's it's kind of this new paradigm though and I I'm I'm feeling more and more pressure that I should just be having agents writing code for me all the time. It's like I always like I think of myself as like I'm a pretty good engineer but I also think like I'm pretty good at product too and I I have

1:19:38

a lot of ideas and I think a lot of you listening to this probably think the same like you have all these ideas you wish you could execute on but you know you're never going to have the time. Well now maybe you don't need as much time. you can actually start to execute on some of those ideas whether you're building like personal software like I talked earlier I built this one shot at

1:19:54

a time tracking app right that my brother uses now every day with 25 plus employees or um you know whatever else you could like personal productivity stuff you you want so like for me I feel like I should be running multiple agents all the time sometimes I need to be actively involved like because I I need to monitor the results but sometimes

1:20:13

it's just for like personal fun projects I just want to see what it comes up with I may never use it so just fire it, fire it off and let it cook. Spend a bunch of spend a bunch of tokens. So like what? So active active development is like a real-time strategy game and then inactive or dormant

1:20:30

development, whatever, is like roller coaster tycoon. Just like set things up and you just chill. Just let it go. Just let it let it go. You come in, check on it, you know? Yeah. You'll probably get better results

1:20:41

if you check in more often, I suppose. Yeah, I think so. Yeah, I think so. Yeah. I think Yeah. Like like you said,

1:20:47

Shane, the the world is changing. I mean, there's so much you can do. I Yeah, it it's such an exciting time to build and building an agent is just so much fun because I I honestly get so much value out of out of using Superset and using parallel agents regardless of the tooling. So, I'm just I'm kind of I'm on a mission to like tell all my friends to do it and watch them struggle

1:21:13

and try to build the tooling around it. Yeah, I'm a Super Set user. That's right. I've closed two PRs today with Super Set.

1:21:23

That's sick. That's sick. That's good to hear. I've also bitched a lot to you, so

1:21:29

Well, I uh you do that regardless though. Yeah, true. All right. Well, I'm I'm gonna I'm gonna

1:21:35

test it out. I I've used like Conductor a little bit, you know? I've used some other tools. I I I'm going to give I'm

1:21:41

going to give Super Set a run because it is I I feel like I just need it kind of feels like a lightweight solution like right now. It's like that's kind of what I'm looking for is something just to help me manage it so I don't have to manage my own get work trees because that seems like you know what a waste of time. I don't want to do that. That's a dope that's a dope feature. Yeah, it's very seamless for sure.

1:22:00

Thank you. Yeah. Yeah. Please do give me Yeah. any feedback you have. We have work tree setup automation that I need

1:22:07

to add the UI for, but it's it's literally like I don't know, you'll see like five PRs a day because we have three engineers working on it and they're all using Superset. So, they're shipping like a lot of PRs all at once. So, it's changing a lot. It's It's always good when you can use your product to improve your product,

1:22:27

which is probably what you're doing, right? It's like you're using Austria into this codebase so we can use we can use it as well. Yeah, let's do it. Do stuff. We need the orchestration uh agent on

1:22:37

top. There you go. Good thing it's open source. We'll see some PRs for me eventually. That's sick. Yeah. Yeah. You made a PR

1:22:44

into Onlook as well. Oh, yeah. I did. Oh, speaking of not

1:22:50

that, but speaking of, when's your interview? Tell the tell the audience about what you're moving on. Uh we have a YC interview tomorrow. So, I'm just kind of um we're launching.

1:23:02

we're we're just executing on the business, but um I yeah, we're excited to talk to the YC folks. I think um my pessimistic take is that we're we're pretty early. Uh and and like honestly like I'm I'm I'm working on the team. I'm trying to make sure that like like we're really working well together. And

1:23:22

so regardless of what happens, we're just going to execute on the business. And uh you know, any tips I guess on on the interview? I thought we bombed our interview. We got rejected in our like uh onlook

1:23:36

one and then we just kind of continued working on it and then like launched and went viral and then like heard back from the partner. So it's never over. Yeah. You guys are three YC founders,

1:23:48

previous YC founders, right? Previous YC CTO's. Yeah, three CTOs. Oh my god, I forgot about

1:23:54

that. Yeah, y'all are technical as hell. That's the bad. The bad is the team.

1:23:59

Yeah. Well, I mean, I think that's in this space, you got to be able to move fast and so having having a good team definitely sets you gives you an unfair advantage, right? It's hard to build a good team. So, if you already have a

1:24:11

good baseline, that's going to help. Uh, Robot Glock says, "What did I miss?" Robot Glock, thanks for joining. Uh, superset, go check it out at superset.sh sh if you

1:24:24

want to run many clawed agents or codeex agents from your command line at one time. So you can run parallel agents and because we'll just share this one more time. Go ahead. It's a it's a GitHub star party. Go to star party

1:24:41

superset-shet on GitHub. Give it a star. While you're there, give MRA a star. We appreciate that. And yeah, let's get these stars bumped up. Yeah, appreciate it. Hot off

1:24:54

the press. Launched only a few hours ago. So, yeah, just launched. Did you launch on HN?

1:25:01

We did. We got buried. I've never been so buried on HN before. Did you share the direct link, dude?

1:25:07

No. No, I didn't even uh It's just like I I posted and then there were like 10 other posts like in the same minute. It's just It's a launch day. Yeah,

1:25:17

we'll launch again. H is not going anywhere. Yeah, true. Keep launching.

1:25:23

We'll do. All right, Keith, thanks for coming on. I'm sure we'll hang out soon.

1:25:29

Yeah, hit me up when you're in SF and we'll play some tennis or hit up the saloon soon. All right, see you guys. Thanks for having me.

1:25:39

Yeah, fun. Bye. All right, dude. That's a show.

1:25:46

That was a show. Dude, Deb had a good roast on us right here. I just want to like put that there, you know. Uh, nice. All right.

1:25:56

It still hurts. Okay, Dev, it still hurts. All right. Uh, yeah. Anything anything

1:26:04

else you want to share before we wrap this thing up? Uh, just in Ma related stuff. So, we're on a quest right now to fix all our issues. So, if you're a part of our community, please um report issues, make

1:26:21

fixes, be in the help others in Discord. It really goes a long way. Yeah, honestly, even just like posting a detailed issue in Discord gets to gets to GitHub for us and so we appreciate just letting us know and trying try the 1.0. you know, if you tried it right

1:26:40

away and you tried it today, I think you'll see much better results today. It's like, yeah. So, a lot of people are using it now in production. So, use it, test it out,

1:26:51

report issues if you do find any. We appreciate it. And the the slash commands that we showed today, like if you're encountering an issue, like you could also solve it yourself using those commands as well if you're in open source. So, have fun with that.

1:27:04

So, someone's asking for the Discord link. It's in the footer of the mra.ai website, but I will grab it as well. Uh, one second here. Gonna do it. Drop

1:27:19

it in the chat. I guess I can't really drop it in the chat. It's not going to post to X1. Sorry, it doesn't let me do

1:27:26

that. But it's in our YouTube. Go to monster.ai. Click on the,

1:27:32

you know, go to the footer of the website. You can find the Discord link. Join our Discord.

1:27:37

come and join a bunch of uh you know thousands of other people building and playing around with MRA and building some cool things. All right, dude. That was a show. That was good.

1:27:48

Fun times. Yeah. Thanks everyone for watching AI Agents Hour. We do this every week,

1:27:53

usually on Mondays around noon Pacific time. Uh please give us a review if you haven't already. Please like and subscribe and share and tell others if you think they're into learning more about AI agents and all this crazy stuff that's going on in the AI world. We'll see you next time. See you. Stay.