Codex Adds Pets, Cursor Ships an SDK & Claude Connects to Blender and Ableton - This Week In AI
Shane and Abhi are in person at the CodeRabbit studio, and AISI just quietly torched one of Anthropic's loudest narratives. AISI confirmed GPT-5.5 is the second model to complete a multi-step cyber attack simulation end-to-end. The first was Mythos. David Cramer calls TUIs "caveman shit." Kenzie at Browserbase builds an agent in under ten minutes that ranks every SF tech event by free food probability. Codex ships Tamagotchi-style pets. Apple accidentally leaves CLAUDE.md files in a support app update. Cursor releases its SDK. OpenCode 2.0 becomes embeddable. Matt Pocock drops Sandcastle. Warp goes open source. The harnesses are becoming frameworks, and the frameworks are growing harnesses. Anthropic Ships connectors for Blender. Claude Security enters public beta. /goal lands in Codex CLI as OpenAI's take on the Ralph loop. OpenAI says GPT-5.5 is its strongest launch yet — API revenue 2x faster than any prior release, Codex revenue doubling in seven days. Vasuman posts an essay on why building real agents is harder than the hype suggests. Open weights keep closing the gap. Kimi K2.6 beats Claude, GPT-5.5, and Gemini at a programming contest. Qwen3 6.27B takes the open weights crown under 150B parameters. Mistral Medium 3.5 lands as a 128B dense model with 256k context. GitHub has a rough week. Wiz Research discloses an RCE achievable with a single git push. Agents are becoming customers. Stripe Link is the wallet for agents. Cloudflare lets agents start paid subscriptions. Doola and Replit will form a US LLC inside the chat. RAMP's coding agent now writes 70% of merged PRs. DeepSeek's input cache is 10x cheaper. Node 20 hits EOL, Zod prepares to drop CommonJS, and TypeScript native previews ship. AI Agents Hour is a weekly livestream by Mastra CPO Shane Thomas and CTO Abhi Aiyer. Mondays, 12PM Pacific.
Watch on
Episode Transcript
1 00:00:00,001 --> 00:00:04,640 Shane: So maybe Mythos isn't as, you know, great as it initially seemed. 2 00:00:04,640 --> 00:00:04,960 Shane: Yeah. 3 00:00:04,960 --> 00:00:05,120 Shane: Right? 4 00:00:05,360 --> 00:00:08,960 Shane: The all the security concerns around Mythos, the touting of that is this 5 00:00:09,219 --> 00:00:10,900 Shane: Great model that the world can't have. 6 00:00:10,900 --> 00:00:15,700 Abhi: Yeah, so maybe Mythos isn't some myth, you know, just like a regular model. 7 00:00:29,340 --> 00:00:32,460 Shane: Hello everyone and welcome to Agents Hour. 8 00:00:32,460 --> 00:00:33,180 Shane: I'm Shane. 9 00:00:33,180 --> 00:00:34,220 Shane: I'm here with Abhi. 10 00:00:34,220 --> 00:00:34,940 Shane: What's up? 11 00:00:34,940 --> 00:00:37,580 Shane: As you can tell, we're in person today. 12 00:00:37,581 --> 00:00:41,160 Shane: Thanks to our friends, CodeRabbit, for letting us borrow the beautiful studio. 13 00:00:41,160 --> 00:00:45,720 Abhi: Um if y'all don't use CodeRabbit, uh we do, and y'all should too. 14 00:00:45,721 --> 00:00:48,980 Abhi: CodeRabbit is an AI review product, uh, and much more actually. 15 00:00:48,980 --> 00:00:51,940 Abhi: We'll talk about that uh later, other features they do. 16 00:00:51,940 --> 00:00:53,300 Abhi: But at Mastra 17 00:00:53,540 --> 00:00:56,420 Abhi: Um we generate a lot of code, so many PRs. 18 00:00:56,420 --> 00:01:01,300 Abhi: Now that we're using Devin too, like there's this unimaginable amounts of PRs being created 19 00:01:01,301 --> 00:01:06,360 Abhi: And we have a rule on our team, which is the rabbit must be quiet before the PR gets merged. 20 00:01:06,360 --> 00:01:13,240 Abhi: So thanks CodeRabbit for not only sponsoring open source projects like ours, but giving us the space in this dope ass studio. 21 00:01:13,240 --> 00:01:14,440 Abhi: So thank you guys again. 22 00:01:14,440 --> 00:01:15,960 Shane: We got a lot of news. 23 00:01:15,961 --> 00:01:17,920 Shane: Lot of great things to talk about today, a lot of drama. 24 00:01:17,920 --> 00:01:21,200 Shane: First let's start with our friend, David Cramer's posts. 25 00:01:21,200 --> 00:01:22,479 Shane: TUIs are no good. 26 00:01:22,479 --> 00:01:23,759 Shane: Sorry y'all 27 00:01:23,760 --> 00:01:25,800 Shane: A CLI's utility and situational. 28 00:01:25,800 --> 00:01:31,080 Shane: This should not be confused with stuffing a fully interactive GUI into a low capability platform. 29 00:01:31,080 --> 00:01:35,240 Shane: Let's ignore all the great UI technology for the last 20 years. 30 00:01:35,241 --> 00:01:36,660 Shane: And build some caveman shit. 31 00:01:36,980 --> 00:01:39,700 Abhi: It's a typical David uh take for sure. 32 00:01:39,700 --> 00:01:42,500 Abhi: I believe Dax also replied to this, you know, like 33 00:01:42,501 --> 00:01:43,040 Abhi: Of course. 34 00:01:43,040 --> 00:01:44,640 Abhi: You hate us cause you ain't us, you know? 35 00:01:44,640 --> 00:01:45,280 Shane: Yeah, of course. 36 00:01:45,440 --> 00:01:47,360 Shane: 'Cause it feels like it was probably targeted. 37 00:01:47,360 --> 00:01:54,560 Shane: It w I doubt it was targeted specifically at open code, but it you know they they gotta feel a certain way about a post like this. 38 00:01:54,561 --> 00:01:58,600 Abhi: I think most of these AI products are CLI and like everyone's going to CLI first. 39 00:01:58,600 --> 00:02:02,280 Abhi: If your users are already at the CLI then you have to build a good TUI. 40 00:02:02,281 --> 00:02:04,820 Abhi: There's a lot of n nice libraries to do so now too. 41 00:02:04,820 --> 00:02:09,140 Abhi: But I also kind of feel the sentiment, like I don't really like TUIs that myself. 42 00:02:09,459 --> 00:02:09,619 Abhi: No. 43 00:02:09,619 --> 00:02:14,980 Shane: I had someone try to convince me that they were getting all their business folks using 44 00:02:15,080 --> 00:02:20,600 Shane: Claude Code and that everyone would be using a CLI and I held my tongue, but I wanted to say I'd bullshit. 45 00:02:20,600 --> 00:02:20,920 Shane: Yeah. 46 00:02:20,920 --> 00:02:21,720 Shane: I don't believe it. 47 00:02:21,720 --> 00:02:23,560 Shane: I love I I don't mind using a CLI. 48 00:02:23,560 --> 00:02:23,960 Shane: Like I'm a 49 00:02:24,180 --> 00:02:24,820 Shane: I'm a developer. 50 00:02:24,820 --> 00:02:25,460 Shane: I'm an engineer. 51 00:02:25,460 --> 00:02:28,580 Shane: Like give me a give me whatever's gonna be the most effective tool for the job. 52 00:02:28,580 --> 00:02:34,100 Shane: However, I just don't see the the T the TUI is not the interface of AI long term. 53 00:02:34,100 --> 00:02:36,500 Abhi: And also like Codex app is a beautiful app. 54 00:02:36,500 --> 00:02:37,060 Abhi: So like 55 00:02:37,060 --> 00:02:38,900 Abhi: You don't have to just use TUIs. 56 00:02:38,900 --> 00:02:42,260 Abhi: And that people are investing in the app version of these things as well. 57 00:02:42,260 --> 00:02:45,379 Abhi: Maybe Claude just was not that great, like cowork and stuff. 58 00:02:45,379 --> 00:02:47,060 Shane: I think it's getting better though. 59 00:02:47,061 --> 00:02:47,161 Abhi: Yeah. 60 00:02:47,162 --> 00:02:48,120 Shane: I think it's getting better. 61 00:02:48,120 --> 00:02:50,760 Shane: I there's always gonna be the NeoVim users, right? 62 00:02:50,760 --> 00:02:51,160 Shane: Yeah. 63 00:02:51,160 --> 00:02:51,880 Shane: They're gonna exist. 64 00:02:51,880 --> 00:02:53,160 Shane: They're gonna be power users. 65 00:02:53,160 --> 00:02:56,360 Shane: You're gonna have TUIs, I think, for all this stuff. 66 00:02:56,361 --> 00:03:02,379 Shane: But there's a a reason that tools like Superset or Conductor, they're kind of a layer of UI on top of your you know, CLI. 67 00:03:02,379 --> 00:03:04,459 Shane: So I think that over time we're gonna see 68 00:03:04,560 --> 00:03:06,079 Shane: More and more things get pulled out. 69 00:03:06,079 --> 00:03:08,799 Shane: It's just I think that's the interface that was easiest to build first. 70 00:03:08,799 --> 00:03:11,439 Shane: You don't have to think about the UI as much when it's just, you know, yeah. 71 00:03:11,439 --> 00:03:12,480 Shane: A terminal window. 72 00:03:12,480 --> 00:03:13,120 Shane: But I think w 73 00:03:13,540 --> 00:03:16,580 Shane: people and teams are going to expand beyond that eventually. 74 00:03:16,580 --> 00:03:25,060 Abhi: Dax has a counter to David and it's he says, we built OpenTUI so executives could prompt cool terminal apps that make them feel hackery. 75 00:03:25,061 --> 00:03:27,500 Abhi: And remind them of the younger days when they were useful. 76 00:03:27,980 --> 00:03:31,980 Shane: Which I feel like was kind of like a day gag David back a little bit 77 00:03:33,640 --> 00:03:34,760 Shane: So this was funny. 78 00:03:34,760 --> 00:03:40,920 Shane: So if you've been following the show for a while, you know that back in YC, Abhi and I used to go to a lot of meetups 79 00:03:40,921 --> 00:03:41,700 Shane: I mean s we still do. 80 00:03:41,700 --> 00:03:44,420 Shane: We did because we were introducing a lot of people to Mastra. 81 00:03:44,420 --> 00:03:53,060 Shane: We were learning about what uh everyone else is doing with AI and we eventually came to this realization that if you wanted to in San Francisco, you could basically eat 82 00:03:53,061 --> 00:03:54,320 Shane: for free every night of the week. 83 00:03:54,320 --> 00:03:54,640 Shane: Yep. 84 00:03:54,640 --> 00:03:59,360 Shane: You just find go to Luma, find an AI event, and you probably can get free pizza. 85 00:03:59,360 --> 00:04:01,520 Shane: And so we talked about actually trying to do that for a week. 86 00:04:01,520 --> 00:04:02,320 Abhi: We never did it. 87 00:04:02,321 --> 00:04:02,580 Abhi: No. 88 00:04:02,580 --> 00:04:05,940 Shane: But there are many times where we had three or four nights where we just ate pizza for free. 89 00:04:05,940 --> 00:04:07,540 Abhi: I have a pizza PTSD, dude. 90 00:04:07,860 --> 00:04:09,060 Abhi: I just can't do it anymore. 91 00:04:09,060 --> 00:04:16,820 Shane: And so it's really funny that Kenzie McDonald, looks like from Browserbase, said she spent twenty-two dollars on a salad. 92 00:04:16,821 --> 00:04:17,919 Shane: What if food was just free? 93 00:04:17,919 --> 00:04:23,440 Shane: So she built an agent that scrapes every SF tech event and ranks them by free food probability. 94 00:04:23,440 --> 00:04:26,479 Shane: And she did it in less than 10 minutes with Browserbase and Stagehand. 95 00:04:26,560 --> 00:04:27,199 Shane: There you go. 96 00:04:27,199 --> 00:04:31,520 Shane: If you want, if you're in San Francisco, you want to eat for free, you can just go to tech meetups. 97 00:04:31,520 --> 00:04:35,120 Abhi: I went to a tech event last week and I was like, you know, just casual conversation. 98 00:04:35,120 --> 00:04:36,160 Abhi: I was like, hey, why are you here? 99 00:04:36,160 --> 00:04:37,360 Abhi: And he's like, free food. 100 00:04:37,360 --> 00:04:38,800 Abhi: And then he left. 101 00:04:40,680 --> 00:04:44,120 Shane: We're gonna call this section what the fuck is going on? 102 00:04:44,120 --> 00:04:45,879 Shane: Pets are now in codex. 103 00:04:45,879 --> 00:04:47,400 Shane: It's like Tamagotchis are back. 104 00:04:47,400 --> 00:04:48,520 Abhi: Tamagotchis are back. 105 00:04:48,520 --> 00:04:52,599 Abhi: And Claude Code is, I think, is very much responsible for this craze, right? 106 00:04:52,600 --> 00:04:54,280 Shane: The claw logos everywhere. 107 00:04:54,280 --> 00:04:54,600 Shane: Yeah. 108 00:04:54,600 --> 00:04:56,120 Shane: The claw social network. 109 00:04:56,120 --> 00:04:57,640 Shane: The, you know, there's a ton of stuff. 110 00:04:57,640 --> 00:05:00,280 Shane: I saw I saw a tweet that also s said recently 111 00:05:00,280 --> 00:05:03,720 Shane: LinkedIn is basically just the same as Moltbook now, which is kind of funny. 112 00:05:03,720 --> 00:05:06,680 Shane: Do you remember the social network for uh AI agents? 113 00:05:06,680 --> 00:05:07,880 Abhi: Which was acquired by Meta. 114 00:05:07,880 --> 00:05:08,760 Abhi: That's how you know they suck. 115 00:05:08,920 --> 00:05:09,640 Abhi: I'm just saying. 116 00:05:09,640 --> 00:05:11,000 Shane: So pets in Codex. 117 00:05:11,000 --> 00:05:13,240 Shane: I have not tried this because I don't 118 00:05:13,241 --> 00:05:16,220 Shane: see the purpose yet, but some people were saying it's pretty cool. 119 00:05:16,220 --> 00:05:17,660 Shane: You know, people like Tamagotchi's. 120 00:05:17,820 --> 00:05:20,300 Abhi: I think people like like AI companions and stuff. 121 00:05:20,300 --> 00:05:22,620 Abhi: So like a pet is like a cute way. 122 00:05:22,621 --> 00:05:27,560 Abhi: Instead of talking into like some ominous CLI or something, you're talking to your pet or whatever. 123 00:05:27,560 --> 00:05:30,920 Shane: Yeah, it's like your assistant that's gonna help you write your code better, I guess. 124 00:05:30,921 --> 00:05:32,259 Shane: I thought this was pretty funny. 125 00:05:32,259 --> 00:05:37,780 Shane: Apple apparently accidentally left their CLAUDE.md files in their Apple support app update. 126 00:05:37,780 --> 00:05:41,699 Shane: So people were seeing their CLAUDE.md files from the Apple support app. 127 00:05:41,699 --> 00:05:43,940 Shane: So you know uh you know Apple is using a 128 00:05:43,979 --> 00:05:44,940 Shane: Using Claude models. 129 00:05:45,100 --> 00:05:48,940 Abhi: Yeah, and they kept it in they like left it they left it out of their .gitignore. 130 00:05:48,940 --> 00:05:50,380 Abhi: So, you know, makes sense. 131 00:05:50,380 --> 00:05:52,060 Abhi: It can happen to anyone. 132 00:05:52,280 --> 00:05:53,640 Shane: I saw this one last week too. 133 00:05:53,640 --> 00:06:01,320 Shane: Uh so j the GPT-5.5 prompt for Codex seems to have a duplicated line trying to get it to not talk about creatures. 134 00:06:01,321 --> 00:06:09,300 Shane: And it says, you know, never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless it is absolutely and unambiguously 135 00:06:09,660 --> 00:06:11,180 Shane: Relevant to the user's query. 136 00:06:11,180 --> 00:06:11,580 Abhi: Why? 137 00:06:11,580 --> 00:06:12,220 Abhi: I wonder. 138 00:06:12,380 --> 00:06:17,340 Shane: And then I saw some uh some tweets from like Sam Altman basically like joking about goblin mode, right? 139 00:06:17,340 --> 00:06:17,900 Shane: It's yeah. 140 00:06:17,900 --> 00:06:19,500 Shane: It's like there's a goblin mode in 141 00:06:19,740 --> 00:06:20,780 Shane: in codex. 142 00:06:20,780 --> 00:06:29,580 Shane: So I don't know if it's just like the model had a tendency to want to anthropomorphize these creatures into the the prom or like into the responses or or something. 143 00:06:29,580 --> 00:06:30,140 Shane: So it's 144 00:06:30,141 --> 00:06:32,160 Abhi: It's just so weird because like pigeons is on the list. 145 00:06:32,160 --> 00:06:36,320 Abhi: I guess pigeons are monsters of the sky, but like, you know? 146 00:06:36,320 --> 00:06:38,080 Abhi: Raccoons and pigeons. 147 00:06:38,080 --> 00:06:40,480 Abhi: Maybe because they don't want it to conflict with the pet 148 00:06:40,481 --> 00:06:46,120 Shane: And uh in that same vein, Pika Labs introduced the next AI interface for creation, a person. 149 00:06:46,120 --> 00:06:48,920 Shane: Pika agents are the creative partner you birth. 150 00:06:48,921 --> 00:06:57,020 Shane: They have a voice, face, and personality that you create, and they enable an entirely new way to make, refine, and ship anything simply by having a human conversation. 151 00:06:57,020 --> 00:06:59,420 Shane: So it's basically just like a more personalized 152 00:06:59,539 --> 00:07:03,780 Shane: AI agent that helps you build videos and do creative things within Pika. 153 00:07:03,780 --> 00:07:08,259 Shane: But still it's v very much on theme of creating you know characters or 154 00:07:08,420 --> 00:07:10,740 Shane: Personalization around an AI agent. 155 00:07:11,060 --> 00:07:13,700 Abhi: Same from like the open claw world of like having a soul. 156 00:07:14,020 --> 00:07:15,620 Abhi: md, right, to give your 157 00:07:15,860 --> 00:07:19,780 Abhi: your agent, uh some personality or like a mission statement or whatever. 158 00:07:19,780 --> 00:07:23,140 Abhi: This one's pretty cool because you can generate your character with like Seedance 2.0. 159 00:07:23,140 --> 00:07:23,540 Abhi: 0. 160 00:07:23,540 --> 00:07:27,300 Abhi: It kind of looks like, you know, some anime stuff or whatever, but like 161 00:07:27,400 --> 00:07:28,520 Abhi: It's a cool idea. 162 00:07:28,520 --> 00:07:31,960 Abhi: I just don't know how many people wanna do this. 163 00:07:34,040 --> 00:07:40,440 Shane: Cursor introduced the cursor SDK so you can build agents with the same runtime harness and models that power cursor. 164 00:07:40,440 --> 00:07:42,360 Shane: Run agents from CI/CD pipelines. 165 00:07:42,639 --> 00:07:47,120 Shane: Create automations for end-to-end workflows or embed agents directly inside your products. 166 00:07:47,120 --> 00:07:52,080 Shane: So I think cursor is trying to, well, they obviously have that big deal with SpaceX, right? 167 00:07:52,081 --> 00:07:52,300 Shane: Yep. 168 00:07:52,380 --> 00:07:54,460 Shane: Potential ac most likely acquisition offer. 169 00:07:54,460 --> 00:07:56,380 Shane: Like, you know, who knows if it'll actually go through. 170 00:07:56,380 --> 00:08:00,540 Shane: But cursor wants to be part of not just in your IDE, but 171 00:08:00,760 --> 00:08:01,960 Shane: in your workflows. 172 00:08:01,960 --> 00:08:02,200 Abhi: Yeah. 173 00:08:02,200 --> 00:08:04,920 Abhi: And you can leverage their harness in your own applications. 174 00:08:04,920 --> 00:08:11,240 Abhi: Very much like a Claude Code agents SDK uh situation as well as open AI agents SDK. 175 00:08:11,240 --> 00:08:14,600 Abhi: Probably in response to those SDKs existing. 176 00:08:14,601 --> 00:08:20,780 Abhi: And so like if you really like the cursor harness, you can just start building with it outside of the VS Code thing that they provide you. 177 00:08:20,780 --> 00:08:23,660 Shane: Yeah, so I mean because you also can use their background agents, right? 178 00:08:23,660 --> 00:08:24,940 Shane: But now you can basically build your own. 179 00:08:24,940 --> 00:08:25,740 Abhi: And build your own. 180 00:08:25,740 --> 00:08:28,460 Shane: Build your own background agent or whatever. 181 00:08:28,461 --> 00:08:29,380 Shane: And you have a little bit more control. 182 00:08:29,380 --> 00:08:41,700 Abhi: You know, we think about like all these SDKs, especially as being like a framework, you know, Mastra, we have our own agents, we have our own harness, but it's you know, if people really falling in love with these specific harnesses, then we shouldn't get in the way of that. 183 00:08:41,700 --> 00:08:41,940 Abhi: So 184 00:08:42,159 --> 00:08:48,320 Abhi: You know, expect to see like Codex and Cursor and Claude Agents within Mastra Studio. 185 00:08:48,320 --> 00:08:49,920 Abhi: I would put a bet on that 186 00:08:49,921 --> 00:08:50,620 Abhi: That's happening very soon. 187 00:08:51,020 --> 00:08:53,580 Shane: Open code is becoming more embeddable in 2. 188 00:08:53,580 --> 00:08:55,340 Shane: 0 in a similar type of vein. 189 00:08:55,340 --> 00:09:00,140 Shane: They want you to be able to embed and use open code on the server or wherever else you want to run it. 190 00:09:00,141 --> 00:09:07,199 Shane: So it's very similar in fashion to what you know the cursor SDK and I think there's kind of a general trend around this 191 00:09:07,280 --> 00:09:11,760 Abhi: Yeah, it's like the harnesses are becoming frameworks and the frameworks are having harnesses, right? 192 00:09:11,760 --> 00:09:12,000 Shane: Yeah. 193 00:09:12,000 --> 00:09:14,400 Abhi: Uh so I think everything will coalesce for sure. 194 00:09:14,400 --> 00:09:16,400 Shane: Yeah, and we've been talking about for a while just 195 00:09:16,401 --> 00:09:18,280 Shane: Yeah, we often call them like production agents. 196 00:09:18,520 --> 00:09:23,160 Shane: It was kind of the term we used to use, and then now coding agents are the surface area is becoming one. 197 00:09:23,160 --> 00:09:25,960 Shane: So there's a lot of uh coalescing happening between those two. 198 00:09:26,280 --> 00:09:29,160 Abhi: And I guess we can also support open code. 199 00:09:29,161 --> 00:09:29,780 Abhi: in Mastra Studio. 200 00:09:29,780 --> 00:09:30,660 Abhi: So yeah, why not? 201 00:09:30,900 --> 00:09:35,060 Shane: Matt Pocock released this thing called Sandcastle. 202 00:09:35,060 --> 00:09:36,820 Shane: It's built his own software factory. 203 00:09:36,820 --> 00:09:42,820 Shane: I think it's kind of very similar in veins to like, you know, G Stack and just that that kind of like popularity of 204 00:09:42,860 --> 00:09:48,860 Shane: of things where it's like your own personal productivity people are announcing their own personal productivity methods, but I haven't looked too much into it. 205 00:09:48,860 --> 00:09:50,220 Shane: Did you look into Sandcastle at all? 206 00:09:50,380 --> 00:09:55,660 Abhi: Yeah it's very similar to like how it works is there he has uh different SDKs 207 00:09:55,660 --> 00:10:02,460 Abhi: for running Claude Code um programmatically, codex programmatically, I'm sure he'll add open code cursor. 208 00:10:02,460 --> 00:10:04,620 Abhi: You can execute all of them. 209 00:10:04,640 --> 00:10:09,120 Abhi: through, you know, you can expose your own CLI, you can it's just being programmatic access. 210 00:10:09,120 --> 00:10:12,080 Abhi: You can build your own apps on top of these SDKs. 211 00:10:12,080 --> 00:10:19,120 Abhi: I think this whole software factory thing, this AI factory concept, like I've been hearing this word like for the last couple of weeks that 212 00:10:19,160 --> 00:10:21,320 Abhi: Everyone's trying to create an AI factory. 213 00:10:21,320 --> 00:10:23,720 Abhi: I don't really know what that actually means anymore. 214 00:10:23,720 --> 00:10:31,080 Abhi: Because if you're just running a harness for your company, is that an AI factory like I think it's it's one of those things that's probably becoming conflated. 215 00:10:31,080 --> 00:10:33,560 Shane: I think there's a lot of people trying to treat 216 00:10:33,740 --> 00:10:37,660 Shane: code as, you know, if code it becomes easier, it's easier to produce. 217 00:10:37,660 --> 00:10:39,580 Shane: It becomes more factory-like. 218 00:10:39,580 --> 00:10:43,100 Shane: Can you use these agents so one person can control a 219 00:10:43,180 --> 00:10:46,620 Shane: you know, quote unquote factory of agents, multiple agents running at once. 220 00:10:46,620 --> 00:10:48,300 Shane: I don't know if there's a clear definition. 221 00:10:48,300 --> 00:10:56,220 Abhi: Yeah, but also like from like the NVIDIA perspective, that's like about infrastructure and then, you know, your serving models on GPUs and 222 00:10:56,259 --> 00:10:59,860 Abhi: Scaling them, but like what does it mean in a software AI factory? 223 00:10:59,860 --> 00:11:04,180 Shane: Dev in the chat says that AI factory is a coding agent plus schedule tasks. 224 00:11:04,180 --> 00:11:06,660 Abhi: Why do we make up all these dumb fucking words, dude? 225 00:11:06,660 --> 00:11:08,500 Abhi: Like every single week 226 00:11:08,501 --> 00:11:12,140 Abhi: Oh man, this is now it's another thing to have in your vocabulary. 227 00:11:12,140 --> 00:11:12,540 Abhi: Yeah. 228 00:11:12,540 --> 00:11:12,860 Shane: All right. 229 00:11:12,860 --> 00:11:14,220 Shane: Warp is now open source. 230 00:11:14,220 --> 00:11:14,860 Shane: So that's cool. 231 00:11:14,860 --> 00:11:17,100 Shane: I know there's some people on the team that use warp. 232 00:11:17,101 --> 00:11:17,220 Abhi: Yeah. 233 00:11:17,220 --> 00:11:22,580 Abhi: It's very dangerous that warp is open source for everybody else that's uh trying to be compete against warp. 234 00:11:22,580 --> 00:11:25,860 Abhi: So two weeks ago we're like people are saying open source is dead. 235 00:11:25,861 --> 00:11:26,440 Shane: Yeah, I saw that recently. 236 00:11:26,440 --> 00:11:28,680 Shane: People shouldn't be open source because it's less secure. 237 00:11:28,680 --> 00:11:33,160 Shane: Then of course you had the whole open source mafia come in and say, actually open source is more secure. 238 00:11:33,160 --> 00:11:34,280 Shane: Here are all the reasons why. 239 00:11:34,280 --> 00:11:35,800 Abhi: Yeah, no, Warp's open source. 240 00:11:35,800 --> 00:11:36,760 Abhi: So they're part of the party. 241 00:11:36,760 --> 00:11:37,960 Abhi: Welcome, welcome, Warp. 242 00:11:38,280 --> 00:11:39,320 Abhi: Welcome. 243 00:11:39,321 --> 00:11:40,959 Abhi: Like, share, and subscribe. 244 00:11:40,959 --> 00:11:42,640 Shane: And follow us on X. 245 00:11:42,640 --> 00:11:43,920 Abhi: And tell your friends. 246 00:11:43,920 --> 00:11:44,880 Shane: And their friends. 247 00:11:44,880 --> 00:11:46,320 Abhi: I mean we're not begging. 248 00:11:46,320 --> 00:11:46,959 Shane: Well, uh 249 00:11:47,200 --> 00:11:48,080 Abhi: Maybe a little bit. 250 00:11:48,080 --> 00:11:51,840 Abhi: Subscribe to Agents Hour every Monday, noon Pacific. 251 00:11:51,840 --> 00:11:53,840 Shane: Alright, we have this section quite a bit. 252 00:11:53,840 --> 00:11:56,640 Shane: It's called Anthropic Ships, because they ship things. 253 00:11:56,641 --> 00:11:59,540 Shane: So Claude now connects to tools creative professionals already use. 254 00:11:59,540 --> 00:12:06,579 Shane: With the new Blender connectivity, you can debug a scene, build new tools, or batch, apply changes across every object directly from Claude. 255 00:12:06,580 --> 00:12:08,320 Shane: But it wasn't just Blender. 256 00:12:08,320 --> 00:12:14,880 Shane: They also have connectors for Autodesk, Adobe Creative Cloud, Ableton, Canva, SketchUp. 257 00:12:14,881 --> 00:12:16,060 Shane: A whole bunch of different things. 258 00:12:16,060 --> 00:12:22,620 Shane: So they shipped a bunch of different connectors that of course the the joke is how many startups do they put out of business with with each connector, right? 259 00:12:22,780 --> 00:12:25,820 Abhi: And I think these connectors are through MCP still. 260 00:12:25,821 --> 00:12:35,779 Abhi: I think having creative work like really makes it more exciting because like I've always wanted to use Ableton, but I don't necessarily want to learn how to use Ableton. 261 00:12:35,780 --> 00:12:36,180 Shane: Yeah, I bought a book. 262 00:12:36,180 --> 00:12:40,100 Shane: You know, I used to I did a lot of music productions, so I I have used Ableton. 263 00:12:40,100 --> 00:12:43,060 Shane: I bought a book, a huge book, back when people actually bought books. 264 00:12:43,300 --> 00:12:44,820 Shane: This is like fifteen years ago probably. 265 00:12:44,820 --> 00:12:46,580 Shane: I read a few chapters. 266 00:12:46,581 --> 00:12:47,620 Shane: And then I put it on the shelf. 267 00:12:47,620 --> 00:12:48,899 Shane: And it's I still have that book. 268 00:12:48,899 --> 00:12:50,899 Shane: I was like, I'll just guess I'll just use GarageBand. 269 00:12:50,899 --> 00:12:51,620 Shane: It's a little simpler. 270 00:12:51,620 --> 00:12:52,339 Abhi: Yeah, it's just simple. 271 00:12:52,339 --> 00:12:55,459 Abhi: I used to use like Fruity Loops, like even the simpler one. 272 00:12:55,460 --> 00:13:00,660 Abhi: But it imagine if your agent, if the agent is actually good at the connection, Blender specifically, right? 273 00:13:00,660 --> 00:13:02,900 Abhi: You can start making 3D models 274 00:13:02,901 --> 00:13:03,001 Abhi: Yeah. 275 00:13:03,002 --> 00:13:11,319 Abhi: I remember like I think I demoed something long ago by like, you know, generating Porsche models and then driving them around and that type of stuff gets really exciting, like and 276 00:13:11,540 --> 00:13:18,100 Abhi: You know, obviously I don't know where the consumer aspect of this how much money you'll pay, but like it definitely can make you more creative. 277 00:13:18,100 --> 00:13:20,580 Shane: Yeah, at least it feels like y you can 278 00:13:20,959 --> 00:13:28,240 Shane: Try something new, understand it, be somewhat productive with it, and then I think at minimum, you might still need to understand the tool eventually. 279 00:13:28,240 --> 00:13:31,200 Shane: That as like a getting started experience is so much better. 280 00:13:31,200 --> 00:13:31,839 Abhi: Yeah. 281 00:13:31,840 --> 00:13:35,580 Shane: Then going like I I just always look at like the the example is like Photoshop back in the day. 282 00:13:35,580 --> 00:13:37,580 Shane: Like looking at the interface for the first time. 283 00:13:37,580 --> 00:13:39,740 Shane: There's so many buttons, you don't know where to start. 284 00:13:39,741 --> 00:13:45,199 Shane: If you can just start by asking questions, well the on ramp is a lot better, even if you do eventually have to understand what all the tools do. 285 00:13:45,199 --> 00:13:45,440 Shane: Yeah. 286 00:13:45,440 --> 00:13:47,680 Shane: You know, ultimately it's not just a chat interface. 287 00:13:47,680 --> 00:13:52,639 Shane: I do think there's still usefulness of UI, but it's the chat should supplement your UI use. 288 00:13:52,639 --> 00:13:54,319 Shane: So it should teach you like, oh, if you do want to 289 00:13:54,680 --> 00:13:55,720 Shane: Here's where you can go. 290 00:13:55,720 --> 00:13:57,319 Shane: Here's how you can change that in the future. 291 00:13:57,319 --> 00:13:59,560 Shane: Or you can just ask the chat and we'll do it for you. 292 00:13:59,560 --> 00:14:00,040 Shane: Yeah. 293 00:14:00,040 --> 00:14:05,399 Shane: I feel like there's a good uh way to kind of like morph these two worlds together in a tasteful way. 294 00:14:05,399 --> 00:14:06,600 Shane: I don't think people are thinking that. 295 00:14:06,600 --> 00:14:07,880 Shane: They're just like throw a chat 296 00:14:08,040 --> 00:14:10,760 Shane: application in front of it and then or have the UI. 297 00:14:10,760 --> 00:14:13,560 Shane: But I feel like there's like a yes and that you can have both. 298 00:14:13,560 --> 00:14:16,760 Shane: Claude Security is now in public beta for cloud enterprise customers. 299 00:14:16,760 --> 00:14:17,240 Shane: That's cool. 300 00:14:17,240 --> 00:14:18,040 Shane: We got a lot to cover. 301 00:14:18,040 --> 00:14:19,720 Shane: We're going to keep going. 302 00:14:21,100 --> 00:14:23,340 Shane: Slash goal in codex CLI. 303 00:14:23,340 --> 00:14:25,660 Shane: And then we'll talk a little bit about GPT-5. 304 00:14:25,820 --> 00:14:26,220 Shane: 5. 305 00:14:26,220 --> 00:14:28,300 Shane: Slash goal lands in Codex. 306 00:14:28,300 --> 00:14:30,540 Shane: It's their take on the Ralph loop. 307 00:14:30,540 --> 00:14:32,220 Shane: Keep a goal alive across turns. 308 00:14:32,220 --> 00:14:34,540 Shane: Don't stop until it's achieved. 309 00:14:34,541 --> 00:14:34,641 Shane: Super cool. 310 00:14:34,642 --> 00:14:36,640 Abhi: We have slash golden Mastra Code now as well. 311 00:14:36,640 --> 00:14:39,360 Shane: We borrowed this idea because I've ran many Ralph loops. 312 00:14:39,360 --> 00:14:39,600 Shane: Yeah. 313 00:14:39,840 --> 00:14:43,440 Shane: And now I haven't been running those because I just use Mastra Code. 314 00:14:43,440 --> 00:14:46,320 Shane: But having something that runs a little bit longer. 315 00:14:46,321 --> 00:14:57,080 Abhi: I think initial reaction on the uh Mastra Code side of this concept is pretty sweet, you know, to have uh you know, doing a large task but making sure that the the main goal is always like in the loop. 316 00:14:59,720 --> 00:15:04,040 Shane: Yeah, I saw some posts where people just had, you know, three hour sessions that they just let run. 317 00:15:04,560 --> 00:15:06,480 Shane: It did what it you know needed it to do. 318 00:15:06,480 --> 00:15:08,640 Shane: You know, of course your mileage is gonna vary, right? 319 00:15:08,640 --> 00:15:13,200 Shane: If you if you give it uh a task, you know, you shouldn't expect that it's gonna do it perfectly. 320 00:15:13,200 --> 00:15:16,240 Shane: But if it's well defined, you know, you might have some good results. 321 00:15:16,241 --> 00:15:21,080 Shane: If it is relatively clear you can actually have longer running agents that should be able to accomplish some pretty cool things. 322 00:15:21,080 --> 00:15:21,240 Abhi: Yeah. 323 00:15:21,480 --> 00:15:27,000 Shane: So this is from OpenAI, one week since the launch of GPT-5.5, and it's already our strongest model launch yet. 324 00:15:27,000 --> 00:15:30,360 Shane: API revenue is growing more than two X faster than any prior release. 325 00:15:30,959 --> 00:15:36,240 Shane: Codex doubled revenue in under seven days as enterprise demand for agentic coding tools keeps climbing. 326 00:15:36,240 --> 00:15:37,200 Shane: So I think this is twofold. 327 00:15:37,200 --> 00:15:39,440 Shane: I think GPT-5.5 is a good model. 328 00:15:39,441 --> 00:15:39,541 Abhi: Yep. 329 00:15:39,542 --> 00:15:42,200 Shane: And I think Anthropic keeps shooting themselves in the foot. 330 00:15:42,200 --> 00:15:42,360 Shane: Yeah. 331 00:15:42,600 --> 00:15:44,280 Shane: Which helps open AI's cause. 332 00:15:44,280 --> 00:15:48,520 Shane: I imagine some of that is just natural growth of more enterprise demand. 333 00:15:48,520 --> 00:15:50,120 Shane: I know some of that is people 334 00:15:50,620 --> 00:15:53,980 Shane: Moving from Opus 4.7 to 5.5. 335 00:15:53,980 --> 00:15:55,500 Shane: I know I use 5.5 more. 336 00:15:55,500 --> 00:15:57,019 Shane: I still use Opus 4.7 a little. 337 00:15:57,019 --> 00:15:57,980 Shane: I want to compare. 338 00:15:57,980 --> 00:16:01,019 Abhi: I use four seven to do planning because Claude is unhinged. 339 00:16:01,019 --> 00:16:02,380 Abhi: So it's a great planner, right? 340 00:16:02,380 --> 00:16:03,019 Abhi: You want to 341 00:16:03,020 --> 00:16:04,480 Abhi: Be creative in your ideas. 342 00:16:04,480 --> 00:16:07,760 Abhi: But uh it loses the plot like when you're doing big execution. 343 00:16:07,760 --> 00:16:10,880 Abhi: Uh so I've been using 5.5 for execution 344 00:16:10,881 --> 00:16:12,920 Abhi: And on our team it's kind of trending that way too. 345 00:16:12,920 --> 00:16:17,720 Abhi: I think everyone is kind of sick of Claude, uh, in general, especially like Tyler is like super sick of 346 00:16:19,080 --> 00:16:19,720 Abhi: Cancel this of the things. 347 00:16:23,160 --> 00:16:26,680 Abhi: He thinks it's like the best agentic coding model from the Frontier. 348 00:16:26,680 --> 00:16:27,480 Abhi: I would agree. 349 00:16:27,481 --> 00:16:27,920 Abhi: It's pretty good. 350 00:16:27,920 --> 00:16:30,000 Abhi: If i I just d don't think it's great for planning. 351 00:16:30,000 --> 00:16:31,920 Abhi: It's not a very uh chatty model. 352 00:16:31,920 --> 00:16:36,080 Abhi: Like if you think your coding agent is your subconscious, like it's definitely not that. 353 00:16:36,080 --> 00:16:37,520 Abhi: It's very like, you know, to the point. 354 00:16:37,520 --> 00:16:40,080 Abhi: So I think all these uh good factors came together. 355 00:16:40,081 --> 00:16:42,680 Shane: OpenAI released WebSockets in the responses API. 356 00:16:42,920 --> 00:16:45,720 Abhi: Agent chats are gonna become increasingly more multiplayer. 357 00:16:45,720 --> 00:16:49,480 Abhi: Uh so SSE streams are not the greatest for for that. 358 00:16:49,481 --> 00:16:52,060 Abhi: Unless you're using something like sync or durable streams. 359 00:16:52,060 --> 00:16:56,779 Abhi: Or you just do something that's, you know, been around for forever, which is WebSockets. 360 00:16:56,779 --> 00:17:00,540 Abhi: And you know, you can connect to multiple clients to the same session. 361 00:17:00,541 --> 00:17:04,939 Abhi: You can start building multiplayer chat experiences or agent experiences. 362 00:17:04,939 --> 00:17:10,699 Abhi: And so responses API is open AI is that's where p things actually happen that are stateful. 363 00:17:10,699 --> 00:17:12,699 Abhi: And so that's natural. 364 00:17:12,700 --> 00:17:16,000 Abhi: And I guess we need to start doing some WebSocket stuff ourselves. 365 00:17:16,080 --> 00:17:18,240 Shane: This is from the AI Security Institute. 366 00:17:18,240 --> 00:17:27,200 Shane: It says OpenAI's GPT-5.5 is the second model to complete one of our multi-step cyber attack simulations end-to-end, the first being mit the Mythos pre- 367 00:17:27,880 --> 00:17:32,600 Shane: So maybe Mythos isn't as, you know, great as it initially seemed. 368 00:17:32,600 --> 00:17:32,920 Shane: Yeah. 369 00:17:32,920 --> 00:17:33,080 Shane: Right? 370 00:17:33,480 --> 00:17:36,920 Shane: All the security concerns around Mythos, the touting of that it's this 371 00:17:37,140 --> 00:17:38,820 Shane: Great model that the world can't have. 372 00:17:38,820 --> 00:17:40,260 Shane: Well, we have 5. 373 00:17:40,260 --> 00:17:44,260 Shane: 5, and it seems like they might be closer than expected. 374 00:17:44,260 --> 00:17:46,900 Abhi: Yeah, so maybe mythos isn't some myth, you know. 375 00:17:46,900 --> 00:17:48,420 Abhi: It's just like a regular model. 376 00:17:48,421 --> 00:17:51,340 Shane: Yeah, maybe just a a little bit better of a model. 377 00:17:53,900 --> 00:17:55,180 Shane: So there was this post. 378 00:17:55,180 --> 00:17:57,900 Shane: If AI is so great, why isn't it working? 379 00:17:57,901 --> 00:17:59,480 Shane: Vass had this really great post. 380 00:17:59,480 --> 00:18:01,560 Shane: It's Vasuman on X. 381 00:18:01,560 --> 00:18:06,440 Shane: I think if you read this post, there's a lot of really good takeaways on just how you should be thinking about 382 00:18:06,560 --> 00:18:12,560 Shane: building agents and yes, like the one of the things people sometimes say is like, well the models aren't consistent enough. 383 00:18:12,560 --> 00:18:13,440 Shane: They're not good enough. 384 00:18:13,440 --> 00:18:15,760 Shane: I think it of course it depends on your use case, right? 385 00:18:15,760 --> 00:18:17,760 Shane: Like what you're what are you trying to get the model to do? 386 00:18:17,760 --> 00:18:19,280 Shane: You gotta be realistic. 387 00:18:19,281 --> 00:18:25,500 Shane: But also there's a lot of tips for how you should actually be thinking about breaking down the problem in a way that AI can actually help you solve it. 388 00:18:25,500 --> 00:18:29,019 Shane: Talking about like human in the loop, talking about how to structure, how to build 389 00:18:29,060 --> 00:18:32,980 Shane: like agent systems for your internal teams to use that are reliable and useful. 390 00:18:32,980 --> 00:18:38,260 Shane: So I think there's a lot of good information here if you're building agents, especially if you're building like internal agents for your team. 391 00:18:38,260 --> 00:18:43,860 Shane: There's a lot of good takeaways I think on here that I can relate to just talking to a lot of users that are building. 392 00:18:43,861 --> 00:18:44,880 Shane: Similar things with Mastra. 393 00:18:45,039 --> 00:18:51,360 Abhi: Like in this article, he also mentions that like the coding software agents are super easy to verify. 394 00:18:51,360 --> 00:18:53,440 Abhi: And that's like where everyone is kind of 395 00:18:53,640 --> 00:18:54,360 Abhi: going for. 396 00:18:54,360 --> 00:19:03,880 Abhi: But these like finance related marketing, like these non-technical disciplines, they require so much more that they have verification loops and 397 00:19:04,060 --> 00:19:11,260 Abhi: proper milestoning for your agent and like really hard to verify these non-technical things because you don't have end to end tests in real life. 398 00:19:11,260 --> 00:19:12,540 Abhi: I mean you do, but like 399 00:19:12,660 --> 00:19:14,100 Abhi: Nothing like programmatics. 400 00:19:14,100 --> 00:19:14,340 Shane: Yeah. 401 00:19:14,340 --> 00:19:19,220 Shane: I mean and and that's why you end up when you're building agents, you have to build those verification loops somehow. 402 00:19:19,220 --> 00:19:19,620 Shane: Right. 403 00:19:19,620 --> 00:19:21,060 Shane: Otherwise you can't really improve. 404 00:19:21,060 --> 00:19:21,380 Shane: Yeah. 405 00:19:24,100 --> 00:19:26,900 Shane: And they're they're they're spending right now, I think, I don't know, maybe 406 00:19:27,140 --> 00:19:30,740 Shane: three grand a week on just token cost for an agent that they built. 407 00:19:30,900 --> 00:19:32,580 Shane: It's all like internal improvement. 408 00:19:32,580 --> 00:19:35,140 Shane: Agent helps them with like ticket solving internally. 409 00:19:35,140 --> 00:19:36,500 Shane: They're they're decent sized team. 410 00:19:36,500 --> 00:19:38,980 Shane: And they're trying to figure out like how do we prove the business value? 411 00:19:38,980 --> 00:19:39,220 Shane: Yeah. 412 00:19:39,300 --> 00:19:41,860 Shane: It's like well, you need a verification loop. 413 00:19:41,861 --> 00:19:45,540 Shane: to in order to actually prove that it actually solved the problem, right? 414 00:19:45,540 --> 00:19:50,180 Shane: It's it's easier with code because you can say like, okay, we wrote a test, the test passed. 415 00:19:50,181 --> 00:19:50,281 Shane: Yeah. 416 00:19:50,282 --> 00:19:51,539 Shane: There's at least some level of verification. 417 00:19:51,539 --> 00:19:53,700 Shane: It's not perfect, but it gives you some confidence. 418 00:19:53,700 --> 00:19:58,260 Shane: And I think if you're building an agent that isn't specifically coding, well you need to then build that veri 419 00:19:58,660 --> 00:20:00,100 Shane: verification loop in some way. 420 00:20:00,100 --> 00:20:00,340 Abhi: Yeah. 421 00:20:00,340 --> 00:20:02,500 Abhi: It has to be like a deterministic outcome too. 422 00:20:02,500 --> 00:20:11,780 Abhi: There's a last uh really interesting point in this article, which is like if you see all these efficiencies going up and to the right with the models when it comes to coding tasks 423 00:20:11,781 --> 00:20:20,000 Abhi: You know, I'm gonna paraphrase for him like this is where AI psychosis is coming from 'cause you're really a disillusioned on where we are and capabilities. 424 00:20:20,001 --> 00:20:25,360 Abhi: Because if you're only judging things based on software projects, then sure, yeah, like you can do anything. 425 00:20:25,360 --> 00:20:30,559 Abhi: You have AGIs here, but if you really take it outside of that, then you would not be as impressed, right? 426 00:20:30,559 --> 00:20:30,799 Abhi: You'd 427 00:20:30,960 --> 00:20:32,160 Abhi: A lot of people are delusional. 428 00:20:32,160 --> 00:20:33,280 Abhi: I have AI psychosis. 429 00:20:33,280 --> 00:20:34,320 Abhi: I'll just be the first one to tell you. 430 00:20:34,320 --> 00:20:35,840 Abhi: Like, so I'm not just talking shit. 431 00:20:35,840 --> 00:20:41,040 Abhi: But like, you know, I only have AI psychosis about stuff in software engineering, because I think I could build a database today. 432 00:20:41,040 --> 00:20:42,400 Abhi: Um, but that's dumb. 433 00:20:42,400 --> 00:20:42,720 Abhi: So 434 00:20:44,180 --> 00:20:47,540 Shane: So we're gonna talk about open weight models that are starting to close the gap. 435 00:20:47,540 --> 00:20:52,820 Shane: So there was an open weight Chinese model that beat Claude, GPT-5.5, and Gemini in a 436 00:20:52,860 --> 00:20:55,179 Shane: programming challenge was like a word game basically. 437 00:20:55,179 --> 00:20:55,500 Shane: Yeah. 438 00:20:55,500 --> 00:20:57,899 Shane: And it was Kimi K2.6, I think. 439 00:20:57,899 --> 00:21:02,220 Shane: We can read the article, but I think essentially was able to beat these other 440 00:21:02,380 --> 00:21:11,740 Shane: You know, frontier models in this kind of one specific task that is kind of like a programming challenge or like a like a a gamification, right, of trying to win in this game. 441 00:21:11,741 --> 00:21:18,180 Shane: Doubt it was trained on that game specifically, but it was able to now like learn the rules and and beat these models in that specific task. 442 00:21:18,180 --> 00:21:20,020 Shane: Alibaba's Qwen3. 443 00:21:20,020 --> 00:21:26,180 Shane: 627B is the new open weights leader under 150 billion parameters, scoring forty-six on the artifact. 444 00:21:28,300 --> 00:21:30,620 Shane: So Quinn's pumping out some good models. 445 00:21:30,620 --> 00:21:32,620 Abhi: Better reasoning for lower the price. 446 00:21:32,620 --> 00:21:34,780 Abhi: That's like the economics that we want. 447 00:21:34,781 --> 00:21:40,980 Shane: Qwen3.5 Plus is basically a frontier model, so yeah, and's three dollars per million output tokens. 448 00:21:40,980 --> 00:21:41,540 Shane: It's cheap. 449 00:21:41,540 --> 00:21:41,940 Shane: Yep. 450 00:21:41,940 --> 00:21:42,500 Shane: There's a 451 00:21:42,501 --> 00:21:43,280 Shane: Qwen3. 452 00:21:43,280 --> 00:21:45,280 Shane: 6 35B versus Sonnet 4. 453 00:21:45,280 --> 00:21:46,400 Shane: 6, which one is better? 454 00:21:46,400 --> 00:21:50,960 Shane: And you can read this and kind of come to the conclusion, I think, that Qwen was better in this case. 455 00:21:50,960 --> 00:21:51,280 Abhi: Yep. 456 00:21:51,280 --> 00:21:53,360 Abhi: And it's very or they're very competitive. 457 00:21:53,360 --> 00:21:53,760 Abhi: Yeah. 458 00:21:53,760 --> 00:21:55,840 Abhi: Which is not good for America. 459 00:21:55,840 --> 00:21:59,680 Shane: And in that same vein, Grok 4. 460 00:21:59,680 --> 00:22:00,880 Shane: 3 still behind 461 00:22:00,960 --> 00:22:03,360 Shane: Chinese open source models in intelligence, right? 462 00:22:03,360 --> 00:22:09,280 Shane: If you look at this kind of leaderboard here, so this is the same like artificial analysis intelligence index. 463 00:22:09,281 --> 00:22:12,180 Shane: Which is again just one of the mini benchmarks that you should really care about. 464 00:22:12,180 --> 00:22:16,260 Abhi: But you know, it is cursor acquisition looks pretty uh promising, you know. 465 00:22:16,260 --> 00:22:21,860 Shane: Yeah, it's like you know, they there's maybe a reason that they think if they team up with cursor, maybe they can improve 466 00:22:21,861 --> 00:22:23,100 Shane: Mistral Medium 3. 467 00:22:23,100 --> 00:22:25,500 Shane: 5 is a new flagship model in public preview. 468 00:22:25,500 --> 00:22:33,900 Shane: So it merges instruction following reasoning and coding into a single 128B dense model with 256k context window and configurable. 469 00:22:33,901 --> 00:22:34,180 Shane: Reasoning effort. 470 00:22:34,180 --> 00:22:37,620 Shane: It's the new default model for Mistral Vibe and Lee Chat. 471 00:22:37,620 --> 00:22:40,580 Shane: Releases open weights under a modified MIT license. 472 00:22:40,580 --> 00:22:45,620 Shane: I haven't seen a lot of hype around it to see how it's actually doing in real life, you know 473 00:22:45,621 --> 00:22:52,200 Shane: I think there's always the this thought that Mistral's a little bit behind the Chinese open weight models, but it's good to see that they're still shipping. 474 00:22:52,200 --> 00:22:55,960 Abhi: Yeah, like the Mistral is the AI company of Europe, so like 475 00:22:56,460 --> 00:22:57,580 Abhi: They have to be in the game. 476 00:22:57,580 --> 00:22:58,620 Shane: They need to stay in the game. 477 00:22:58,620 --> 00:22:59,100 Abhi: Yeah. 478 00:22:59,100 --> 00:23:02,860 Abhi: I don't know any European though that uses Mistral besides Mistral employees. 479 00:23:02,860 --> 00:23:06,540 Shane: So yeah, we got quite a few people in Europe and I don't think they're they're using Mistral 480 00:23:06,800 --> 00:23:08,800 Shane: But you gotta keep shipping to get back in the game. 481 00:23:08,800 --> 00:23:10,400 Shane: So it's good to see that they're still around. 482 00:23:10,400 --> 00:23:12,160 Shane: They're still shipping. 483 00:23:12,640 --> 00:23:14,320 Shane: Feels like we talked about this last week. 484 00:23:14,320 --> 00:23:14,720 Shane: Yeah. 485 00:23:14,720 --> 00:23:16,000 Shane: GitHub's been having some 486 00:23:16,160 --> 00:23:17,120 Shane: Problems, we'll say. 487 00:23:17,120 --> 00:23:20,320 Shane: Wiz research discovered remote code execution on GitHub. 488 00:23:20,320 --> 00:23:28,160 Shane: com with a single git push, so the flaw in GitHub allows unauthorized access to millions of repositories belonging to other users and organizations. 489 00:23:28,320 --> 00:23:29,600 Shane: That's not great 490 00:23:29,601 --> 00:23:38,840 Shane: And there with all the downtime they had to respond, Mario Rodriguez basically had published updates on recent incidents on April 23rd and April 27th, and of course they had one 491 00:23:39,060 --> 00:23:41,380 Shane: Today, uh but at least one more since then. 492 00:23:41,380 --> 00:23:43,700 Shane: So there's been a lot of issues with GitHub. 493 00:23:43,700 --> 00:23:46,740 Shane: And you can argue it's because their quality's going down. 494 00:23:46,740 --> 00:23:50,820 Shane: You could argue it's because they just have so much more usage now than they maybe 495 00:23:51,140 --> 00:23:52,580 Shane: had ever experienced before. 496 00:23:52,580 --> 00:23:54,500 Shane: And so that that could also be part of it. 497 00:23:54,500 --> 00:24:00,020 Abhi: I'm looking at their uptime monitor right now and there is none nothing that is four nines. 498 00:24:00,020 --> 00:24:02,580 Abhi: So except for copilot model providers. 499 00:24:02,580 --> 00:24:03,780 Abhi: That's 100% uptime. 500 00:24:03,860 --> 00:24:05,860 Abhi: But because no one uses it. 501 00:24:06,559 --> 00:24:14,480 Shane: So this next section we're calling agents our customers now because Stripe introduced Link and Link is the wallet for agents. 502 00:24:14,480 --> 00:24:17,760 Shane: It lets you securely empower agents to spend on your behalf. 503 00:24:17,760 --> 00:24:20,240 Shane: Your payment credentials are never exposed, and you approve every 504 00:24:23,000 --> 00:24:25,160 Abhi: Well last week was stripe sessions here in the city. 505 00:24:25,160 --> 00:24:35,320 Abhi: So there were a bunch of people in town going to the stripe sessions and I think this was this and integrations were the two popular things uh from were what people took away. 506 00:24:35,760 --> 00:24:41,679 Abhi: And there are a lot of agentic commerce companies in town too trying to like, you know, throw their hat in the ring. 507 00:24:41,679 --> 00:24:43,440 Abhi: So the space is about to blow up. 508 00:24:43,440 --> 00:24:47,440 Shane: This is an announcement from Cloudflare, uh, and it was April 29th. 509 00:24:47,441 --> 00:24:50,840 Shane: But starting today, agents can now be Cloudflare customers. 510 00:24:50,840 --> 00:24:58,360 Shane: They can create a Cloudflare account, start a paid subscription, register a domain, and get back an API token to deploy code right away. 511 00:24:58,360 --> 00:25:01,880 Shane: So your agent can now deploy your code for you. 512 00:25:02,040 --> 00:25:03,720 Shane: It's actually kind of useful, I think. 513 00:25:03,720 --> 00:25:04,200 Abhi: Pretty cool. 514 00:25:04,200 --> 00:25:05,240 Abhi: Especially for DevTools. 515 00:25:05,240 --> 00:25:09,800 Abhi: Also know Michael from WorkOS wants this reality to exist for every product. 516 00:25:09,800 --> 00:25:10,120 Abhi: So 517 00:25:10,440 --> 00:25:11,000 Abhi: We're on our way. 518 00:25:11,000 --> 00:25:13,159 Shane: Yeah, yeah, I mean WorkOS has been talking about it for a while. 519 00:25:13,159 --> 00:25:16,600 Shane: Yeah, should be able to just sign up for an account or do things on your behalf. 520 00:25:16,600 --> 00:25:18,200 Shane: And we're getting there 521 00:25:18,201 --> 00:25:23,640 Shane: In a similar vein, this is from Doola and I guess it's integrated with Replit. 522 00:25:23,640 --> 00:25:28,760 Shane: It says you can now form a US LLC without leaving the AI chat you're already in. 523 00:25:28,761 --> 00:25:32,080 Shane: So you can essentially form a your agent can form a business for you. 524 00:25:32,080 --> 00:25:33,840 Abhi: Off your slop that you just wrote. 525 00:25:33,840 --> 00:25:34,240 Abhi: Yeah. 526 00:25:34,240 --> 00:25:40,320 Shane: It's like you could generate an app, maybe it's good, but you can create a business for it all without ever leaving Replit, which is 527 00:25:40,460 --> 00:25:40,940 Shane: Kind of cool. 528 00:25:41,100 --> 00:25:41,340 Abhi: Yeah. 529 00:25:41,340 --> 00:25:44,380 Abhi: And then you can like sign up for Cloudflare and get your business going, right? 530 00:25:44,540 --> 00:25:46,860 Shane: I'm worried about like people make it too easy. 531 00:25:46,860 --> 00:25:48,940 Shane: There's a lot of things you have to do once you form a business. 532 00:25:48,940 --> 00:25:49,340 Abhi: Seriously. 533 00:25:49,500 --> 00:25:52,700 Shane: Do your taxes every year and like spinning down a business is actually kind of hard. 534 00:25:52,700 --> 00:25:53,020 Abhi: And 535 00:25:53,460 --> 00:25:58,340 Shane: Changing equity of a of an LSC if you ever needed to is actually depending on where you're uh registering. 536 00:25:58,340 --> 00:26:01,220 Shane: I'm sure it registers a Delaware company, so it's a little easier. 537 00:26:01,220 --> 00:26:02,740 Shane: Should we use this and create a business? 538 00:26:02,740 --> 00:26:04,500 Shane: Dude, I don't need another uh I think we're good. 539 00:26:04,660 --> 00:26:05,540 Shane: I don't need another tax 540 00:26:05,760 --> 00:26:07,200 Shane: No, they're business of my tax return. 541 00:26:07,200 --> 00:26:08,080 Abhi: Yeah, that's true. 542 00:26:08,080 --> 00:26:10,880 Shane: Um it is cool to m you know, lower the barriers. 543 00:26:10,880 --> 00:26:12,720 Shane: Just be careful if you're forming a bunch of businesses. 544 00:26:12,720 --> 00:26:15,120 Shane: There's some costs associated with that. 545 00:26:15,121 --> 00:26:17,740 Shane: Gumloop agents now have their own email addresses. 546 00:26:17,740 --> 00:26:22,460 Shane: So you can email them, CC them, loop them into conversations, let them start their own threads. 547 00:26:22,460 --> 00:26:24,460 Abhi: I think email is going to become a solid primitive. 548 00:26:26,060 --> 00:26:27,100 Shane: What's the use case? 549 00:26:27,100 --> 00:26:34,780 Abhi: If you want to have your agent represent you, like conversations with others, you could use that email address as the one that you give out. 550 00:26:34,780 --> 00:26:36,780 Abhi: AgentMail makes us easy. 551 00:26:36,781 --> 00:26:37,740 Abhi: So shout out to AgentMail. 552 00:26:37,740 --> 00:26:45,180 Abhi: But let's say if you want to email your agent, I mean you could just talk to it, but like your email inbox becomes like a queue, right? 553 00:26:45,181 --> 00:26:45,980 Abhi: So that's pretty interesting. 554 00:26:46,059 --> 00:26:48,700 Shane: I only push back because I hate email and I'm terrible at it. 555 00:26:48,700 --> 00:26:52,139 Shane: Because it's it's it feels like someone else is populating my to do list. 556 00:26:52,139 --> 00:26:55,340 Shane: But if it's my agent's to do list, maybe I don't mind as much. 557 00:26:55,340 --> 00:26:57,260 Shane: It's a probably a good way for 558 00:26:57,519 --> 00:27:05,679 Shane: external parties to interact with your agent in a way that you know you can process, you know, you get gets the email, it kicks off a process, it validates, it does the thing. 559 00:27:05,679 --> 00:27:09,360 Shane: It probably doesn't, you know, you don't let it do dangerous things without approval, of course. 560 00:27:09,360 --> 00:27:10,559 Shane: So yeah. 561 00:27:10,560 --> 00:27:13,960 Shane: I can see the use case. 562 00:27:13,960 --> 00:27:16,200 Shane: Alright, let's talk about some quick hits here. 563 00:27:16,200 --> 00:27:17,720 Shane: Linear releases came out. 564 00:27:17,720 --> 00:27:18,920 Shane: This was April 30th. 565 00:27:18,920 --> 00:27:21,560 Shane: Manage software releases directly from Linear. 566 00:27:21,560 --> 00:27:24,440 Shane: Track the deployment environment. 567 00:27:24,441 --> 00:27:28,860 Shane: version and status of every issue to give team members and agents your full deployment context. 568 00:27:28,860 --> 00:27:31,500 Shane: Electric introduced electric agents. 569 00:27:31,500 --> 00:27:33,180 Shane: Agents are not compute. 570 00:27:33,181 --> 00:27:34,280 Shane: Agents are data. 571 00:27:34,280 --> 00:27:36,280 Shane: Multi-agent is a sync problem. 572 00:27:36,280 --> 00:27:39,560 Shane: Electric agents is the first agent platform built on sync. 573 00:27:39,560 --> 00:27:43,160 Shane: Use it to build scalable, collaborative, long-live multi-agent systems. 574 00:27:43,160 --> 00:27:43,720 Abhi: Congrats. 575 00:27:43,720 --> 00:27:46,360 Shane: Yeah, congrats to friends over at Electric. 576 00:27:46,361 --> 00:27:47,100 Shane: This one is cool. 577 00:27:47,100 --> 00:27:49,980 Shane: I haven't tried it, but I want to because I want to compare it to Suno. 578 00:27:49,980 --> 00:27:56,940 Shane: So ElevenLabs released Eleven Music, a new platform to discover, remix, create, and earn from music. 579 00:27:56,941 --> 00:27:58,740 Shane: Built on the ElevenLabs music model. 580 00:27:58,740 --> 00:28:00,020 Abhi: We're definitely gonna give this a spin. 581 00:28:00,260 --> 00:28:06,180 Shane: So every week we do this, uh we kind of do a recap of the week and we always put together like a suno song talking about 582 00:28:06,181 --> 00:28:08,720 Shane: All the things we shipped that last week just for the internal team. 583 00:28:08,720 --> 00:28:10,880 Shane: Maybe someday we'll we'll share some of these stuff publicly. 584 00:28:10,880 --> 00:28:12,080 Abhi: But the Mastra album. 585 00:28:12,080 --> 00:28:13,040 Shane: The Mastra album. 586 00:28:13,040 --> 00:28:14,240 Shane: But just internally. 587 00:28:14,241 --> 00:28:15,120 Shane: Now we gotta run it up. 588 00:28:15,120 --> 00:28:18,560 Shane: We gotta A B test Suno verse Eleven Music and see who wins. 589 00:28:18,560 --> 00:28:20,000 Shane: XAI has some announcements. 590 00:28:20,000 --> 00:28:23,360 Shane: Voice cloning is now live via the XAI API. 591 00:28:23,361 --> 00:28:32,080 Shane: Create a custom voice in less than two minutes, or select from our library of 80 plus voices to personalize your voice agents, audiobooks, video game characters, and more. 592 00:28:32,080 --> 00:28:32,799 Shane: XAIs 593 00:28:33,160 --> 00:28:34,920 Shane: Really trying to compete with ElevenLabs. 594 00:28:34,920 --> 00:28:43,880 Abhi: I guess if if you're every frontier wants like a voice model, except for anthropic, I don't think they have one, but Anthropic was always late to the even like multimodal. 595 00:28:44,380 --> 00:28:49,260 Shane: They were you know they were focused on coding and text kind of it's probably why they were so far ahead for a while. 596 00:28:49,260 --> 00:28:49,740 Abhi: Yeah. 597 00:28:49,740 --> 00:28:52,940 Shane: Grok Imagine released something called agent mode. 598 00:28:52,940 --> 00:28:58,220 Shane: So it says your entire creative workflow just collapsed into one infinite canvas. 599 00:28:58,221 --> 00:29:03,340 Shane: In imagine agent mode you can brainstorm, write, generate, and edit images, then turn them into videos without leaving the page. 600 00:29:03,340 --> 00:29:05,500 Shane: So it creates a much more uh 601 00:29:05,760 --> 00:29:08,240 Shane: I guess engrossing creative studio, I guess. 602 00:29:08,240 --> 00:29:10,160 Abhi: There's a lot of cool stuff on Grok Imagine. 603 00:29:10,160 --> 00:29:11,840 Abhi: Like if you go to X you can see it. 604 00:29:11,840 --> 00:29:12,960 Abhi: I think it's pretty powerful. 605 00:29:12,960 --> 00:29:15,679 Abhi: Like maybe it's a response to the ChatGPT. 606 00:29:15,680 --> 00:29:16,159 Abhi: Image 2. 607 00:29:16,159 --> 00:29:18,639 Abhi: 0, but I don't know what XAI is up to, you know what I mean? 608 00:29:18,639 --> 00:29:21,919 Shane: Ooh, I'm so Editframe emerges from stealth. 609 00:29:21,919 --> 00:29:23,120 Shane: Agents need video. 610 00:29:23,120 --> 00:29:24,480 Shane: Editframe agent skills. 611 00:29:24,480 --> 00:29:28,240 Shane: You just npm create at Editframe latest. 612 00:29:28,241 --> 00:29:34,320 Shane: And you can prompt cloud code, cursor, codex, or Mastra Code, get a working video or a full interactive GUI. 613 00:29:34,320 --> 00:29:35,840 Shane: We're gonna get Jeremy on the show sometime. 614 00:29:35,840 --> 00:29:37,039 Shane: He's a free he's a friend. 615 00:29:37,039 --> 00:29:37,760 Shane: And yeah. 616 00:29:37,761 --> 00:29:39,460 Shane: Congrats on the sick launch. 617 00:29:39,460 --> 00:29:41,139 Shane: Kind of reminds me of like remotion. 618 00:29:41,380 --> 00:29:47,139 Shane: But the cool thing about Editframe is I've been using I was aware of Editframe back when I was doing uh 619 00:29:47,260 --> 00:29:58,220 Shane: platform called audio feed which is like an audio podcasting thing I'd learned about Editframe they have a really cool syntax just HTML or React to find videos and so it's actually really easy for your agents to write HTML and React 620 00:29:58,221 --> 00:29:58,480 Abhi: Yeah. 621 00:29:58,480 --> 00:30:00,400 Shane: So it doesn't have to learn a new language. 622 00:30:00,400 --> 00:30:03,840 Shane: It's just like a slightly different format of you know what it already knows. 623 00:30:03,840 --> 00:30:05,920 Shane: And you can generate really cool videos. 624 00:30:05,920 --> 00:30:07,360 Shane: So it's pretty powerful. 625 00:30:07,361 --> 00:30:12,740 Shane: It it's very similar to remotion in a lot of ways, but I think you know the syntax is actually much easier for a human to understand. 626 00:30:12,740 --> 00:30:13,060 Abhi: Yeah. 627 00:30:13,060 --> 00:30:14,100 Shane: And you can get a lot of the same thing. 628 00:30:14,260 --> 00:30:15,380 Abhi: But it was a good reception too. 629 00:30:15,380 --> 00:30:15,540 Abhi: Yeah. 630 00:30:15,700 --> 00:30:17,780 Abhi: He's been grinding on this for a while. 631 00:30:17,781 --> 00:30:18,980 Shane: I yeah, I mean multiple years. 632 00:30:18,980 --> 00:30:21,460 Shane: And I think originally it was built for humans, right? 633 00:30:21,620 --> 00:30:25,780 Shane: And then I think what they realize, and you know, we'll ask Jeremy when he comes on the show, is 634 00:30:26,060 --> 00:30:30,700 Shane: When did you decide that what you're building wasn't actually for humans and actually was better equipped for agents? 635 00:30:30,700 --> 00:30:36,620 Shane: Because I think there was a switch in at some point in the last few years of where it became clear that actually this is 636 00:30:36,760 --> 00:30:37,480 Shane: even more powerful. 637 00:30:37,640 --> 00:30:40,760 Shane: Because no one wants to write like HTML video by hand, right? 638 00:30:40,760 --> 00:30:42,120 Shane: It's like kind of tedious. 639 00:30:42,280 --> 00:30:44,600 Shane: DeepSeek, input cache, price drop. 640 00:30:44,600 --> 00:30:45,559 Shane: It was already cheap. 641 00:30:45,559 --> 00:30:45,720 Shane: Yeah. 642 00:30:45,799 --> 00:30:47,160 Shane: Now it's even cheaper. 643 00:30:47,360 --> 00:30:50,559 Shane: Let's talk a little bit about RAMP's in-house coding agent. 644 00:30:50,559 --> 00:30:52,080 Shane: You mean their AI factory? 645 00:30:52,080 --> 00:30:53,679 Shane: They're AI factory, yeah. 646 00:30:53,679 --> 00:31:01,919 Shane: At one point, you know, it was announced that it was writing 30% of all merged PRs, and today it's 70% of all merged PRs. 647 00:31:01,920 --> 00:31:03,560 Shane: And it goes far beyond just our engineers. 648 00:31:03,560 --> 00:31:09,800 Shane: I'm assuming, you know, it it's very similar to what you know if you use Devin, I'm guessing they've basically built their own Devin. 649 00:31:09,801 --> 00:31:11,560 Shane: that kicks off and writes their PRs. 650 00:31:11,560 --> 00:31:18,920 Shane: I would be curious to know, and maybe it says in the data, seventy percent of all merged PRs, does that mean they didn't no engineer had to steer it? 651 00:31:18,920 --> 00:31:21,080 Shane: I'm assuming there was steering involved in those PRs. 652 00:31:21,080 --> 00:31:23,640 Shane: Like seventy percent of the PRs were started and then 653 00:31:23,640 --> 00:31:26,520 Shane: engineer had to steer it a bit to get it to the result it wanted. 654 00:31:26,520 --> 00:31:27,640 Shane: But that's still impressive. 655 00:31:27,640 --> 00:31:31,720 Abhi: So like uh the channels are Slack, Linear, GitHub, and more. 656 00:31:31,720 --> 00:31:33,240 Abhi: So maybe like email and stuff. 657 00:31:33,240 --> 00:31:38,360 Abhi: And then they are focusing on twenty-four-seven automations that run. 658 00:31:38,361 --> 00:31:41,580 Abhi: Maybe it really is a coding agent plus scheduled workflows dev. 659 00:31:41,580 --> 00:31:43,980 Abhi: There are conversations happening with this agent. 660 00:31:43,980 --> 00:31:45,340 Abhi: It's not autonomous. 661 00:31:45,340 --> 00:31:46,940 Abhi: But some things probably are. 662 00:31:46,940 --> 00:31:49,419 Shane: Happy node 20 end of life day. 663 00:31:49,419 --> 00:31:50,460 Shane: A perfect day to mention. 664 00:31:50,460 --> 00:31:53,740 Shane: I plan to drop common JS in the next minor version of Zod. 665 00:31:53,740 --> 00:31:58,460 Shane: So for you TypeScript, Common JS is going away, maybe 666 00:31:58,480 --> 00:32:00,240 Abhi: Just pin your Zod versions, y'all. 667 00:32:00,240 --> 00:32:02,960 Shane: All right, tell me about a TypeScript native previews. 668 00:32:02,960 --> 00:32:06,320 Abhi: TS7, which is a complete rewrite and go. 669 00:32:06,320 --> 00:32:08,640 Abhi: We're using the alpha or the beta right now. 670 00:32:08,640 --> 00:32:10,880 Abhi: Type check is like ridiculously fast. 671 00:32:10,880 --> 00:32:13,200 Abhi: Um, but yeah, if y'all want to get 672 00:32:13,360 --> 00:32:15,360 Abhi: on board with the new version of TypeScript. 673 00:32:15,360 --> 00:32:17,600 Abhi: There's still some things that are like left to do. 674 00:32:17,600 --> 00:32:22,960 Abhi: Like all your, you know, TS within all of our editors need to have the right language servers. 675 00:32:22,960 --> 00:32:28,000 Abhi: And so right now it's available on VS Code Marketplace if you're still using VS Code. 676 00:32:28,001 --> 00:32:31,320 Abhi: And it's available on npm, so I would start using it. 677 00:32:31,320 --> 00:32:38,840 Abhi: Everything is a lot zippier in TS7, especially having struggled with the slowness for so many years now. 678 00:32:38,840 --> 00:32:40,680 Abhi: It is fast, bro. 679 00:32:40,681 --> 00:32:41,740 Abhi: I am super stoked. 680 00:32:41,740 --> 00:32:45,420 Abhi: Kinda sad that it's written in Go, but who gives a shit? 681 00:32:45,500 --> 00:32:51,020 Shane: This was a post from Mufiz said they post-trained Qwen3 coder to fix bugs using an actual debugger. 682 00:32:51,340 --> 00:32:51,740 Shane: Maybe it 683 00:32:52,120 --> 00:32:57,000 Shane: These models that you can like fine-tune or train, you can actually give them a real debugger to use. 684 00:32:57,000 --> 00:33:01,480 Shane: So you're feeding back the information directly into model, and it obviously increased the solve rate. 685 00:33:01,480 --> 00:33:03,640 Shane: So it's kind of an interesting technique. 686 00:33:03,641 --> 00:33:09,440 Shane: And something I think we're gonna see more of, more people try, which is just actually giving a debugger to your agent. 687 00:33:09,440 --> 00:33:13,600 Shane: NVIDIA announced Nemotron 3 Nano Omni. 688 00:33:13,601 --> 00:33:19,519 Shane: So it's the latest addition to the Nemotron family, highest efficiency open multimodal model with leading accuracy. 689 00:33:19,519 --> 00:33:20,559 Abhi: I met that guy before. 690 00:33:20,559 --> 00:33:23,600 Abhi: He interviewed me on the at the GTC thing. 691 00:33:23,600 --> 00:33:24,240 Abhi: That's cool. 692 00:33:24,240 --> 00:33:24,880 Abhi: Try it out. 693 00:33:24,960 --> 00:33:27,920 Shane: Open slide is a slide framework built for agents. 694 00:33:27,920 --> 00:33:29,760 Shane: Prompt your agent, get a polished deck. 695 00:33:29,760 --> 00:33:32,960 Abhi: I'm trying it this out this week because I have a workshop on Thursday. 696 00:33:32,960 --> 00:33:35,360 Abhi: I'm going to use this uh this CLI. 697 00:33:35,361 --> 00:33:36,720 Shane: Yeah, I would like to see how it compares. 698 00:33:36,720 --> 00:33:42,960 Shane: So obviously we're flipping through these slides here that were created, you know, just generated from HTML, right? 699 00:33:42,960 --> 00:33:43,440 Shane: So I 700 00:33:43,660 --> 00:33:45,740 Shane: I'm curious how much better it is. 701 00:33:45,740 --> 00:33:47,900 Shane: I know a lot of people are just using cowork. 702 00:33:48,140 --> 00:33:51,660 Abhi: We were using what Google Slides for our workshops and then now 703 00:33:51,860 --> 00:33:55,940 Abhi: The engineers running the workshops are just generating React essentially for slides. 704 00:33:55,940 --> 00:33:56,740 Abhi: It's that easy. 705 00:33:56,740 --> 00:33:58,340 Shane: Yeah, and that's what we do here for the show. 706 00:33:58,340 --> 00:33:58,500 Shane: Yeah. 707 00:34:00,580 --> 00:34:05,860 Shane: So Chris Tate announced AI CLI generate images, video, and text from your terminal. 708 00:34:05,860 --> 00:34:06,660 Shane: Pipe them together. 709 00:34:06,660 --> 00:34:07,700 Shane: It works with any agent. 710 00:34:07,700 --> 00:34:11,139 Shane: This is pretty cool because you can generate an image and then pipe that to a video 711 00:34:11,460 --> 00:34:20,420 Shane: uh to uh basically a video model to animate that image so you can use hundreds of models multimodal comparison inline previews and it uses AI SDK and AI gateway 712 00:34:20,421 --> 00:34:21,560 Abhi: Chris Tate's on a tear, dude. 713 00:34:21,560 --> 00:34:22,919 Shane: Just seems like every week he's shifting something. 714 00:34:23,079 --> 00:34:26,280 Abhi: I feel like he's the goat right now of just like tinkerer projects, you know. 715 00:34:26,280 --> 00:34:27,240 Abhi: Shout out to you, Chris. 716 00:34:27,240 --> 00:34:29,240 Shane: Yeah, we gotta get him on the show. 717 00:34:29,241 --> 00:34:31,760 Shane: So flu framework was announced. 718 00:34:31,760 --> 00:34:35,120 Shane: Essentially it's a framework for building your own agent harness. 719 00:34:35,120 --> 00:34:35,360 Abhi: Yep. 720 00:34:35,440 --> 00:34:36,080 Abhi: Looks pretty cool. 721 00:34:36,080 --> 00:34:40,400 Abhi: A lot of the ideas are very uh comparable to what we have going on. 722 00:34:40,400 --> 00:34:40,960 Abhi: And uh 723 00:34:41,359 --> 00:34:44,320 Abhi: It's kind of pushing us to get our harness out of alpha. 724 00:34:44,320 --> 00:34:50,240 Shane: We had this big debate of should uh cause a lot of the harness primitives overlap with our agent primitives, right? 725 00:34:50,241 --> 00:34:52,639 Shane: And so it's just like what what should live in a harness? 726 00:34:52,639 --> 00:34:53,840 Shane: What should be in an agent? 727 00:34:53,840 --> 00:34:55,520 Shane: And that's why we labeled it alpha, right? 728 00:34:55,600 --> 00:35:01,600 Shane: So we just weren't s solid on are these the right APIs for what a harness should be. 729 00:35:01,601 --> 00:35:05,900 Shane: But I think we've gotten some clarity and I think we've started to see what others are how others are thinking about it. 730 00:35:06,220 --> 00:35:07,100 Abhi: We weren't very wrong. 731 00:35:07,100 --> 00:35:08,380 Abhi: It's not a lot to change, actually. 732 00:35:08,620 --> 00:35:09,740 Shane: And that's it. 733 00:35:09,741 --> 00:35:10,279 Shane: That's the show. 734 00:35:10,279 --> 00:35:13,079 Shane: Make sure you're following us on YouTube, on X. 735 00:35:13,079 --> 00:35:15,079 Shane: You can see all the things here on the screen. 736 00:35:15,079 --> 00:35:17,799 Shane: Go give us a star on GitHub if you haven't already. 737 00:35:17,799 --> 00:35:18,119 Shane: We 738 00:35:18,359 --> 00:35:19,960 Shane: Want to say thanks to CodeRabbit. 739 00:35:19,960 --> 00:35:20,359 Abhi: Yeah. 740 00:35:20,359 --> 00:35:21,560 Shane: We've got the sweet office. 741 00:35:21,560 --> 00:35:25,240 Shane: You've been seeing the, you know, great camera angles uh from today. 742 00:35:25,240 --> 00:35:26,280 Shane: Great lighting. 743 00:35:26,280 --> 00:35:26,599 Shane: So 744 00:35:26,700 --> 00:35:32,060 Shane: We're thankful that CodeRabbit lets us use their their studio every once in a while, so appreciate it there. 745 00:35:32,060 --> 00:35:34,940 Shane: If you're if you need help with code reviews, check 'em out 746 00:35:34,941 --> 00:35:41,599 Abhi: And it actually, you know, CodeRabbit has expanded the feature set too, uh, where you could have uh CodeRabbit in your Slack. 747 00:35:41,599 --> 00:35:49,279 Abhi: You can have CodeRabbit understand the context of uh Slack channels and then do work for you open PRs and stuff, very much like a Devin. 748 00:35:49,880 --> 00:35:57,480 Abhi: thing so I think I think what they're calling it is like the you know a brain the G brain or the the rabbit brain for your Slack channels 749 00:35:57,660 --> 00:35:59,020 Abhi: So uh check it out. 750 00:35:59,020 --> 00:36:00,540 Shane: Thanks everyone for tuning in. 751 00:36:00,540 --> 00:36:02,460 Shane: This has been Agents Hour. 752 00:36:02,460 --> 00:36:03,580 Shane: We do this every Monday. 753 00:36:03,580 --> 00:36:04,780 Shane: We'll be back next week. 754 00:36:04,780 --> 00:36:07,820 Shane: We have two amazing guests that should be on the show 755 00:36:07,821 --> 00:36:09,280 Shane: We'll do the news like we do every week. 756 00:36:09,280 --> 00:36:10,560 Shane: We'll see you next time. 757 00:36:10,560 --> 00:36:11,280 Shane: Peace. 758 00:36:11,280 --> 00:36:12,480 Shane: See ya.