95% of AI projects fail? AI News and guests Sherwood Callaway and Richard from Naptha AI
A recent report says that 95% of generative AI pilots are failing! We have a few guests including Richard from Naptha AI and Sherwood Callaway. We talk about the latest Mastra updates as well as cover all the AI news from the last week.
Watch on
Episode Transcript
Hello everyone. Welcome to AI Agents Hour. What up? My name is Shane and with me as always is Obby. Today we're going to be talking
about a whole bunch of things. We're going to give some MRA updates. We're going to talk to Richard from Napa AI.
We're going to talk with Sherwood. We're going to be going through a whole bunch of AI news. I thought it was going to be like a slow news day. And there's really
not anything major, but there's a lot there's a lot of stuff. Just a lot of stuff. So, it's a huge list, you know? I don't think there's
anything revolutionary in here, but a lot of stuff to cover. So, I'm excited. I like, you know, it just there's no there not a lot of bangers, but just like a lot of movement.
Yeah. Well, how's your weekend, dude? It's good. Back in uh uh Southern
California visiting my family, so that's chill. Um yeah, and it's been good. I'm here chilling and going to come back to SF on Thursday for a meetup. There's a
LLM ops meetup that I'll be going to. Awesome. Also, have you looked at my background?
I know. I was I was like, is that Ward? It is. Shout out to Ward. Shout out to
Ward. If you're watching this, you know you're on the show right now. Ward. Ward's the memory. Dude, we're
gonna go see Ward soon. I'm We're gonna see Ward, dude. I'm so stoked. It's nice. You
haven't been to Belgium before, huh? I have not been to Belgium, but I am excited. So, maybe we should tell people what are we doing, why are we going to be in Belgium, and you know, talk a little bit about, you know, maybe how they could come hang out with us as well. Yeah. So we are traveling as a group all the founders as well but then there are
many of our uh our homies in Europe coming as well. We are going to AI engineer Paris um that's on the 23rd and 24th of September. We'll be in Paris and in Europe I'll be there starting the 14th and we'll be there roughly around the same time. And yeah, if you want to come meet us, we'll be probably throwing some
type of happy hour in Paris. So just if you follow us on Twitter, you'll we'll post it there and then if you're around or you're coming for the conference and meet up, we'll have our Masher shirts on so you'll know it's us. Yeah. Yeah. So please follow us and hopefully we can get together and hang
out at least. If you want to meet Ward, you'll defin he'll definitely be there. Yeah, Ward will be there and he'll have to put the farm aside for a few days and yeah, but yeah, there'll be a lot of people from the team there. So excited to one
hang out with them, also meet a bunch of other people in the space that we haven't seen. You know, it's nice to get across the pond, so to speak, and meet some others, not just living in our, you know, SF bubble or US bubble. Yeah. Yeah. I'm interested to see how the vibe of AI engineer Paris is.
Because if it's mostly like European stuff, and if they don't watch the show, maybe they are in the not in the know, you know? We'll see. Yeah, we're gonna find out. Uh yeah, LA last week for those of you, you know, we
had a a live in person. I was in SF, so we can maybe talk a little bit about some of the stuff we did there. We can maybe talk just a little bit about the big release last week and what's coming next. And then, yeah, we do have a first
guest coming on in probably about five or 10 minutes, I would say. Last week was really good in terms of like a founder week. What did you think? Yeah, I mean, we were busy. We didn't get to the gym once. We didn't do
anything. I mean, I don't No, and this is coming from someone like I I am pretty religious about going to the gym and we were so busy last week. I didn't even I didn't even ask you. Yeah. Like I think I mentioned it once like,
"Dude, we got to get to the gym." And then we totally were just too busy. I remember seeing that text and I'm like, "Oh yeah, let me just finish what I'm doing." And then it was like 10 hours later. I was like, "What time is it?" Oh Yeah.
Uh last week was crazy release week. Um, on Tuesday we attempted to release our V5 compat with AISDK. That didn't really go the way we wanted to on uh Tuesday. Then on Wednesday, we
we got it we got it going. So we have we have support for AICKV5 models. Um, in this release we introduced two new methods. One is stream vext and generate
v-ext. We plan to deprecate though the old generate and stream methods on agents on September 9th. That's the tentative plan. So there'll be like deprecation warnings going out. But stream and generate v next is has
actually been it was a lot of work. It it was actually really good in hindsight because now we control the agentic loop uh via workflows under the hood and there's a lot of opportunity for us to do more tracing u more uh attach our scores natively like it kind of opened up a lot of stuff for us that we didn't necessarily know going in. Um, so that's an ex that's out in V-x now. It's in the
latest uh sorry latest MRA versions and if we're trying to hunt down all the issues with it because we are trying to go you know forward. So if you're using it please report anything that'd be great. Yeah I mean I think one of the things we really tried to do and one of the reasons it took us a little longer is we wanted to make sure we provided a as
smooth of an upgrade path as possible. Yeah. So, Maestro works with AI SDK v4 and AI SDK v5. You should be able to transition between the two pretty seamlessly. You know, there might be
some edge cases, of course, which you should report. Yeah. But it provides hopefully an on-ramp to get there without having to, you know, do a full breaking change. Yeah. We did a whole retro, I don't know
if this is important to say, but we did a whole retro on are we like too risk averse or not? And we realized that like we're gonna we're gonna try to move. We want to move quickly. We just don't want to break people because like it's like the worst feeling in the world being on
the other end and you know you're working on something and everything gets broken and you're all pissed off and stuff. At least that's how I am. So I guess maybe that is the bias that we have, right?
Yeah. Yeah. Ideally no breaking changes unless it's well in advance and it's you know you know we're not in a major yet so we could still break things but we try really hard not to. Yeah. It's a good habit for an
engineering team to build. Yeah. Uh on in other news, we had an event last Wednesday at the YC office. You were kind of busy. You didn't get to
attend the whole thing, you know, but you did show up for some of it. But yeah, it was a great event. It was called Context Engineering. I think they
actually released the videos today, I believe. So, on the Y cominator YouTube. Yeah, yeah, maybe grab the link. We can drop that in. But if you want to see, you
know, Sam, our our other co-founder here, wrote this book, which we always talk about, you know, get your get your version uh digital master.aibook to grab yours, but gave a talk on I mean there were a bunch of talks on context engineering. Dex from human layer was there, Jeff from work OS and a few others just talking about,
you know, what is context engineering, tips and tricks for how to think about context engineering. So if you're interested in learning more about that, go to that thread, you can watch the videos. Sam's talk was really around how we used longme eval to improve MRA's agent memory because that is in in many
cases memory is just context engineering, right? memory is a feature of context engineering. So, uh, we talked about that. We have a blog post on that. We've talked about it here on the show. So, I'm not going to
spend too much time, uh, going into depth on that post, but it I think it's a good primer if you're getting if you are interested in trying to learn more about some tips and tricks, you know, and if you're not familiar with context engineering, you've probably heard it called prompt engineering before. And now, you know, context engineering is the new term in town that everyone's
using. the new the new hype cycle is around context engineering. I do think it makes more sense than prompt engineering. Prompt is very specific. Context is at least a little more broad, covers more things.
I think it's here to stay for a while. I think this term will stick around for a bit. Totally. Especially if people are, you
know, adopting I know Dex has his like context engineering philosophies that he was sharing. We should get him on the show. Dex, if you're listening, I I I really want to go deep in what you were talking about because I believe it, but I don't believe it all the way. Yeah. So, smells a little fishy.
It sounds It sounds too good to be true, but I want to see it. I We should get him on the show and then have him like we should just like try it out. Like, show us a a small example of of how it works, of how he thinks about it. So,
for those I would highly recommend anyone watching this, go check out Dex's talk. It was very good. super in that Y cominator thread. But he talks about the idea that you know his his big
claims is he hasn't looked at code in six weeks. That was his one of the big claims because they the team aligns on a spec or like basically a research report and then a spec and those things essentially get committed to your codebase. So if you review that you don't have to review every line of code. Now I doubt he hasn't looked at any
code. I don't believe that. But that's the one that's Yeah. But I think his idea is that if
the spec if you spend enough time iterating on the spec then and it's detailed enough you can trust the code to implement the spec now or the agent you know your your coding agent to implement the spec. The one analogy he used that I I think did kind of hit home with me. If you're only committing let's
assume that agents you could just trust them. They could do if you wrote the spec they could write the spec. Exactly. And you could think
of an agent like a compiler right? you that would be like in in Java world just uh just committing your jar file to your repo like the or the actual compiled you know bite code rather than the spec that got you there. So I I see some value in in doing it and I do think that as engineers we should be looking at some
of these different techniques because some of them are going to work long term. Maybe they don't completely work today but if you're trying it out your your team is adopting some of these new things and trying out and figuring out how ways to become more efficient. That's typically a good thing. Yeah. And it's typical that like we're
in the part of this like cycle where you're you want to abstract like we're all pro we've all been prompting for the last year, right? And all of us have different strategies and we're comparing notes and stuff, but then we're almost getting to a point where some people just want abstraction on top of the prompts. Like I just wanted to behave
like how you prompted all that time without me having to know. And I think that's where like this will continue on, you know? Yeah, absolutely. Well, with that, should we bring on our
first guest? Yeah, let's do it. So, before we do, this is live. You
might be watching this live. You might be checking out the podcast after on Apple Podcast, Spotify, all the other places that we are. You might be watching this on LinkedIn or X. But please go ahead and make sure you
subscribe to the YouTube if you are not already subscribed. And you know, if you want to give us a fivestar review, I think that's a good idea. Go to Spotify, go to Apple Podcast. If you don't, if you want to give us a four or lower, you
know, just you might want to find better things to do. That's all I ask that you know, you don't have to give us a review, but if it's a fivestar, please do. And we appreciate that and it helps go a long way for others to find this.
And with that, we're going to bring on Richard and we're going to talk what is Napa AI and just a little bit about his story. So, Richard, hey guys, welcome. Hey, Abby. Nice to meet you. Welcome. Excited to be here. Yeah, nice to meet you. I Ashwin from
our team set this up and said that you'd be a very interesting person to talk to. So, I did a little research and I'm definitely interested. But for those watching, can you give us just a little quick little background?
A background about myself or Yeah, just about you and then we'll we'll probably talk about what you're doing. Yeah. So, my background is in academia originally. I did a PhD in and a couple of posttos in machine learning. Uh
worked as a machine learning engineer in big tech for a few years. Always in generative AI. So working on things like digital avatars. Um did a lot of like orchestration work uh for some of the pipelines. You can imagine with digital
avatars there's a lot of different models you need. You know one for audio to text, one for lip sync, one for image to video. And kind of orchestrating all of these models together in a pipeline was a was a big thing. So I worked a lot
with workflow orchestration tools and that sort of thing. Always wanted to startups on the side. So uh I think Napa is like my third startup at this point. Uh it's the only one that's fundraised. Uh so we've got the the furthest but uh
yeah we work on a bunch of different stuff. Uh honestly we're just trying to find product market fit. Um but broadly we're building products in the agent space. So we tried to you know launch a MCP hosting product called AutoMCP. Um
then that became a super competitive space with for sale and you know and AWS moving in. So we decided to move away. We built a O product to make it easier to connect agents to to different data sources and integrations. And now we're
working on a platform that makes agents work better with dev tools uh dev tool libraries and platforms. So, uh, you can log into our SAS app, um, deploy a streamable HTTP MCP server, and that gives you a small number of tools that just make agents work better with your libraries. Um, so there's a few different things that, uh, agents don't
do well. If you, you know, add docs to cursor doc search, like add the master docs and ask it to, you know, generate some examples with Mastra, it might use outdated versions of the library. It might try to import deprecated functions. Uh, Mintify is a bit better, but still not perfect. So one of the
things we're working on is improved doc search as a tool. We also have tools for the the platform API tools for submitting customer support tickets and actually inspired by Mastra the MCP course that you put up on hackernews created a tool for doing these kind of guided installation workflows or really any you know guide uh in the in the
developer docs on our platform. You can convert that into an interactive course which can then be delivered through the through the code editor. So that's what we're working on right now. Yeah, that's dope. That's really awesome. I you know I
think it I felt I felt like it even added more validation too. I think Anthropic released the the learn mode of cloud code which is just a sign that I think I think we are on to something here and I think that is a very good use case of of MCP plus you know agents in general is just not just do the thing for me but teach me how to do it you
know. So yeah that that's really awesome. So how does someone get access to something like this? So, you're still in in beta or is it is it accessible to people? Ultimately, I want now I want to see it,
you know. So, how do people that are watching this get to see it? We just put a very basic landing page live. Uh it's called wizardkit.ai.
So, uh if you go there, you can check out a demo video, which we did for the the Next.js getting started guide. Um yeah, we're currently trying to demo to a few big developer tool uh and platform companies um to to go into like a pilot program for like two to three months. And what we're trying to do with them is
to show that we can increase activation and retention using these kind of ineditor guided walkthroughs. This is the website, right? This is it. All right. wizardkit.ai.
Yeah, we just very cool. We just used as lovable for the landing page, but the the demo video is really what you want to see. Okay. Well, we we'll we'll click through. We'll watch the whole thing and I'm CTO here at NAPA. In this video, I'm going to go through an interactive agent
powered onboarding and installation experience to showcase what our solution has to offer. To get started, I've created a mock documentation page with an installation button. The installation process looks a little bit different for every MCV client, but for the purposes of this video, I'm using Cursor. So, when a user clicks install, they will be linked into cursor and prompted to
approve the installation. Again, looks a little different for different clients, but out of the scope of this video. Also note that in this link we have an optional tracking parameter ID here. So on the client side of your doc site you can template in your users mix panel, Google analytics or post tracking ID
just into that link in the button. Uh this will let us track the user from your doc site to when they install the server even before they authorize and this can then help us determine if they convert later on down the funnel. So I'm going to go ahead and click install. Once that's done, the user will be taken through an oization flow as soon as they click login. For this demo, I'm using a
branded version, but we are capable and happy to proxy to your authorization server so that your end users see your branded experience instead of ours. Once the user has completed authorization, they'll be bounced back into cursor and the service tools will be enabled. The great thing about this is that once the user completes this, a,
they've been deanonymized. And B, if we're proxing your authorization server, they've now signed up for your platform if they haven't already. And we'll go through an example of how deon deanmization works in the dashboard later. Once the user is connected, all they have to do is tell the agent to start the walkthrough.
It's worth noting that we can have multiple walkthroughs attached to a single MCP server. And since it's remote, you can modify and update packages on the fly without having to redistribute packages. If there are multiple walkthroughs, the agent can help the user pick which one is most aligned with what they're looking for, whether that's a walkthrough, an interactive course, an agentic installer wizard, or something else. From there,
it's just an iterative step-based approach. Right now, it's configured to ask the user for feedback and to confirm that they're ready to move on with each step since this one's structured like a course. But we can also have a more kind of hands-off semantic installer where the agent is given instructions to dig through the user's codebase and then, for example, to install a library or
integrate an SDK or instrument an application. So, we can see here that it's going through the beginning of the getting started guide. It checks the system compatibility automatically. It
runs a bunch of commands. It'll install the next.js app and guide the user step by step in the editor as a kind of human in the loop to confirm various actions including skeletoning the project, writing code, guiding and educating the user, and teaching them how the project's organized. So, the great thing
about having all this in the editor compared to toggling back and forth between the editor and a doc page is that there's a lot less friction, less context switching. Agents are also really good at following these instructions. So, when a particular walkthrough is really kind of command and edit heavy, um, agents tend to
follow the instructions much more consistently than humans do. The user also just has to do less to get up and running. It's less tedious, it's faster, there's less friction, all really important things for onboarding new users into a solution. So, we can see here that it went ahead
and skeletoned everything out. It's just kind of Okay, let's let's pause there. Awesome. Yeah, the next part is like the
dashboard. Uh I don't know if you want to show like walkthrough manager or something at some point. How do I get So right maybe go to here.
So this is where we would track the metrics for example. Um you know how much of the course they completed and that sort of thing. And then if you go on a bit later you can see the walkthrough manager as well. Um yeah maybe try play. I think this is where
you can see all of the users that have interacted with your MCP server. Yeah this is the reordered and so forth. And so we have a whole editor for creating your your different walkthroughs on user management. So as I noted, we're able to track an anonymous user from your doc
site uh to when they install the MCP server even before they authorize if you use that optional tracking parameter that I showed you at the beginning of this video. And then once that user authorizes, they're de anonymized and you get all this information about the user, right? So this is an example of just an anonymous user, you know, we can see some sessions uh you know, they're
anonymous user ID. Once that user authorizes, you know, we get their email address, their name, sometimes their profile photo. And so you know, we can get you all this agent is kind of prompt inject do it. Another great, this is another great way to get more insight into your funnel, more data on your users.
Okay, cool. For the walkthrough manager, we basically take your docs and then we have semi-automated the conversion to a interactive walkthrough. So you can, you know, in theory take any guide in your docs like just put it into our platform and it will create this interactive ineditor guided walkthrough. Very cool. did this needs to this
definitely needs to exist because it I can just tell you from when we built the course that it was you know we kind of just hardcoded everything which is what you do the first time you build something right you just kind of hardcode it and it works for us because you know we figured it out we got it working it works pretty well like it could probably work better
right there are techniques and tips because you're kind of getting the coding agent to do what it's not necessarily supposed to do or not intended to do but you can kind of guide it in the right ction. But I am certainly curious how it all works under the hood of course, but also interested in like trying it out on some little
things because I think something like this I actually saw someone else tweeted at me today and yeah, the guy from CMU. Yeah, someone from CMU is built something just for sounds like things he was workshops he was running locally at CMU maybe or something. So I think this ideas is going to take off. You know, I
yeah, I think there's going to be a bunch of interesting use cases for it. Like another one is we're trying to we just released like Stackbench, which is a we're going to open source it. It's a benchmark for how well agents use different uh libraries. And you know, we want to test like cursor uh cursor
coding agents for example. And to be able to like automate the tests with cursor coding agents in the editor, something like an ineditor workflow or automation tool is going to be a super useful thing. So we can basically use these in editor courses to like test these coding agents across like a bunch of benchmarks in this deterministic manner which is going to be super cool.
Yeah, I can't wait to see the data on them. So right now though like the users have to still produce great content though for this to be important, right? Um because it's still like a content and code like it's all it's still very important to the process. Are there a lot of people?
Well, we've talked to people a lot of people's content and docs are not in any good format to do education. So, are they are people they're going to have to do some transformation first or like how's that how's that going? I guess we kind of assume you already have the docs um and then we you know convert that into an ineditor course. I know
there's like a bunch of people trying to autogenerate docs from code and even from like Slack messages uh in team uh channels and that sort of thing. So that's not a problem that we're trying to solve, but uh yeah, hopefully someone else uh helps us out with that. Yeah, for sure. So, do you have any users yet? We So, no is the answer. We're we're
still in like private beta. Um, and we're in the stage late stage like we're trying to close a few pilot customers basically like five to 10 that we can you know run this live on their docs for two to three months and just demonstrate this increase in activation and and retention. So, but hopefully hopefully we'll close our first one soon. Yeah.
Yeah. If you're if you're out there and you have some docs and you would want to try to create an MCP course from them or you know build an MCP doc server, it sounds like you're the person to reach out to. Uh, did you find so I know you've obviously been been building this I I know when I was building the MCP course I found it
was kind of tricky to get the right amount of content per because you look like in your walkthrough you kind of had steps right. Did you find any have you learned anything about like how much content should go in each you know lesson or step of a of a walkth through because I know for us it was it was significantly less than I wanted it to
be. I wanted to just give it like a whole bunch of steps and say like you're the agent, you figure it out. But I I found the best results when I was, you know, would limit it down to I think I basically set like a roughly hard rule like if my markdown file is more than 100 lines, I did too I'm trying to do too much and the agent's probably going to fail. But that was just me playing
around and prompting and and seeing what worked. I'm curious if you found you know what work what has worked best for you all. Are you still figuring that out? Yeah, I think we're still figuring that out. I mean, it's kind of like
semi-automatic, as I said, which means there's a lot of like human kind of fine-tuning to like make sure that the the length of each step is uh works and that sort of thing. So, yeah, we're playing around with this. Um, and it's like, you know, it's an interesting question of like how deterministic should your workflow be versus like how
much autonomy should you give to the the coding agent. And finding that balance, I think, is going to take a little bit of study. Yeah. And the thing that you know is challenging is every agent is slightly different and how well it handles the
different amounts of context. So it's yeah I think that's why you having the benchmarks or whatever being able to run it against multiple different agents could be really useful to say you know if you want an agent to teach you here's the ones you here are the best teachers. It's almost like these are the best professors of what you know. Yeah. And I suppose it could vary based
on the the topic or the content as well a little bit, but overall it would be nice to have some kind of system to say these are the best ones to use if you want to actually learn something. One of the things I want to add to our benchmark is tasks for how well agents install things. So I've done a bit of research into this and there's like yeah a few different um you know data sets
for you know agents setting up an environment and that sort of thing. But uh again it's like super underexplored. So yeah, hopefully on Stackbench we can have like all of the different coding agents and like see how well they do all of the different installation tasks and that that sort of thing.
Yeah, we we wish we could have benchmarked a little bit because we had so many issues that were not even issues with the course. It was just like hey this doesn't work in VS Code properly or the agent did something different in cursor versus wind surf when it was executing this part of the lesson plan and all that stuff. like the content's
the same. Like nothing changed except for the the editor you're using and it sucks for those users too because like they could have had a good experience but they didn't, you know. Yeah. It's like really underexplored like um you know given that so many of developers are using coding agents these
days. um you'd think that developer tool libraries and platforms would know better about how coding agents use their library or use their API or you know install um their packages and that sort of thing but no one has really started thinking about it yet and I think this is going to be become much more important you know we're just kind of getting to the stage where people are
starting to think about go geo geo or whatever like uh how likely is a coding agent to recommend you know a certain observability tool versus another I think this is these are all going to be things that um developer focused companies are going to care more out and they're also going to care about how well coding agents use their libraries.
Yeah, totally. Yeah. So, share share the data when you get it. We'd love to see it.
Yeah, we we'll definitely talk about it on the show. Yeah. Fantastic.
Yeah. Anything else that you've been working on or you want to share? Um what else should people know about you in Napa? Yeah, I guess like Stackbench is kind of like where we're trying to benchmark a lot of the the coding agents. So, check
that out. Uh that's at stackbench.ai. Um it's currently a SAS app, but um
we're going to open source and you know give people the ability to to run it locally. Uh maybe this week, but uh definitely by next week and you'll be able to test it with your cursor agent and that sort of thing. That actually got to the front page of Hacker News last week when we put it out there. Um so check that out. Yeah. Also, yeah,
just check out Wizard Kit uh and reach out if you know you want to try it out. um or if you're a developer focused company that like is interested in in using it. Um but yeah, apart from that like we're we're trying to use Mastra for for a few things in our back end as well. So um for our doc search tool um
we're we're working on using Mastra. Um it's just a simple like contextual retrieval pipeline, but uh yeah, Mastra saves us a lot of time so that we don't have to like implement custom chunking logic and all this sort of stuff. Uh, and it would be really cool to use Mastra to do the automatic course generation at some point in future. It would be great. Yeah. Yeah. It would be great. Help us with our
installs, bro. Yeah. There you go.
We can benchmark hardware coding agencies too and let you know. Yeah, pretty cool. Yeah, that'd be cool. I mean, I think I
mean that is one of the challenges, right, of a dev tools company is ensuring that people can actually use your tool. I know that with uh yeah, you know, just in general, if if people can't use your tool effectively, like how many times, Obby, of you and I joked about just Tailwind and trying to get an editor to write the right version of Tailwind, it it's like versioning so hard and keep making sure that it's
using the same version as what's in your project is hard. And there's other MC like Context 7's an MCP server that has a lot of docs. And it's it's good that it exists, but I also like I found limited value in it because it again it's kind of generalized for everything. And I I've seems like more specific docs
would perform better, right? You know, more specific MCP target at the specific tool you want to have docs about. Of course, then the question is like it has to be easier to install and add these MCPS or maybe autodiscocover the ones that you should have for your project. There's a whole bunch of things that I think you could do to make it easier to kind of add these things when every
DevTools company has one. Then maybe it becomes a little more uh easy to just uh find and turn them on almost almost like a really good marketplace of of discoverability for finding these tools. Yeah. And everything is anecdotal at the moment. Like you know I ask people how well cursor doc search works versus
mintlify and they're kind of like oh well mintlify kind of works better and then people bring up context 7 and like yeah we just need to benchmark these things like across lots of different coding agents and and see how they work and see the numbers and then we can figure out how to improve it. Yeah, totally.
All right, Richard. So the best place to find you besides uh the websites you shared Wizard Kit Yeah, check out wizardkit.ai. AI. Check out stackbench.ai. We do have a main
website, napa.ai, which uh we're updating. Um, as I mentioned, we're like building a lot of different uh products and launching landing pages for them. So, our main website has been out of date, but it should be uh updated by the
end of the day. So, you can check out napa.ai as well. And if you want to reach out to me, I'm rich.
Awesome. Awesome. Well, we'll definitely bring you on again once you get those benchmarks out.
you know, we'd love to chat through them once you get some act some real data that we can dig into because I know that be super useful to just be able to look and compare across multiple different coding agents. Yeah. Awesome. And I'll uh yeah, maybe I can share some results about like what's missing in the master docs or what's
what's not working well in this. Come on and roast us, you know, we for the roast. Great. Yeah. Yeah. Yeah. All right, Richard. Yeah, it was great great chatting with you and for everyone
else. Uh, you know, go check their stuff out. Thanks, guys. See you.
See you soon. Byebye. All right, dude. Dude, that was interesting.
Yeah, more that's kind of funny because it's the second time now that someone's building platforms around this idea of, you know, a course as an MCP. And obviously there's more to it than just that, but that that of course is interesting to me because I spent a lot of time on that damn course. Yeah. Yeah. We did for our tenant and
they have a single or multi-tenant. That's just know it's we're on the right direction though. So that's Yeah. I mean and you know why not? I'm going to give a plug for the course because well you know we have the captive
audience. If you haven't checked it out go to master.aicourse. We've had a thousand people plus sign up
already which is cool. I haven't over a thousand people that have gone through the course. So, more should do it and provide feedback and tell us how we can make it better because I know there's a lot more we could be adding to it.
What? Well, should we bring on another friend? Let's do it. All right.
So, I feel like a lot of people might know if they're on uh if they're in the AI space in in SF, I feel like you you know Sherwood, but for those that don't, they get to meet him. Welcome. Oh, it's uh good to see you guys.
How's it going, dude? Thanks for having me on. Yeah, we have a lot to talk about. Yeah, I mean,
I'm sad. I was start I was in SF last week and I didn't get to see you, you know. We We keep missing each other, dude. Yeah, it's like I send you a text and you're like, "How's this week?" And I'm like, "Not there that week. Maybe if I
just live there, we'd we'd hang out more." But no, you're you're like not very available. Sam's not very available. And
then obvious one of the agent boys. One of the agent boys, dude. Yeah. I was I was out of town this weekend for a wedding, so I wasn't able to link up with with Obby, but you know,
we're we're overdue for a drink, so Yeah, definitely. So, uh, let's get into like the journey, dude, because I don't know if you ever talked publicly. Well, obviously in private, but like for the audience and stuff like tell us about your founding journey thus far.
For sure. Uh, actually sort of went viral recently for for a tweet related to YC. So, I feel like if you a lot of people might be interested in in hearing the story or me going into a little bit more detail. Yeah, let's do it. Just to provide a
little bit more context, uh I studied history undergrad and I'm from North Carolina and I found my way into tech and startups uh by doing a web development boot camp during college. Uh so it was like 2014 web development boot camps were pretty new. I found this boot camp in a Forbes article and like needed something to do over the summer and so I
like packed up for San Francisco and spent uh three months in San Francisco hanging out with people who are just like learning to do web development. And I and at that time it was a Rails boot camp. I remember uh I mean this will give some perspective and your audience is obviously super interested in Typescript and and and JavaScript. Uh
Node came out that summer or I feel like Node was like the was like the hot new technology in summer of 20 2014 and like the idea of writing JavaScript on the server side was was like exciting and new. So um from there I basically didn't look back. I wanted to I knew it was like one of those moments where you're like I'm not really sure like where I
want to go with my life and then suddenly you find something that just like perfectly resonates with your interests and your skills. And so I I basically decided I wanted to go into tech and tech startups and be an engineer at a ventureback tech startup and ultimately to start my own. Um and that summer a lot of people were were reading hacker news and talking about YC. Some folks spun out of the program
and started companies. Other other folks applied to YC and a handful of them got in. So there's a lot of excitement around that community in particular. So it had been my dream for a long time to do a YC venturebacked tech startup. Um
but I made my way back to San Francisco after graduation and ended up working at Crunchbase and then uh joining BS relatively early. I think it was this the like 70th employee at Brex and one of the first couple of infrastructure engineers there. Um, and then while at Brex, my my friend who was on the front end engineering team, he and I relocated to New York to help start Brexit's New York
office. And this was in maybe October of 2019. So, it was like very poorly timed.
Uh, we were all excited because the leadership team was moving out to or the leadership team had like bought a condo in New York and it really seemed like there was going to be kind of a center of gravity shift. And uh I at the time I was dating someone who had been working with Stripe and her team had like started another Stripe office in a
different city. I was like, "Wow, this is a really cool idea like to be a part of the the landing team for this new office." Um so we moved out there in in in late 2019. The office was open for
like three or four months before COVID shut it down and then we found ourselves kind of stuck in New York with a lease. Uh, and my my my co-founder Justin and I worked remotely for Brex for about a year and a half from New York before taking the plunge and applying to YC. Um, and I think that's kind of where things get interesting in terms of our
entrepreneurship story. But that's that's the setup. That's that's a good setup, dude.
Did you like New York during COVID? It was super boring. I had a good experience. I think a lot of people had a pretty bad experience. Um, immediately following COVID, all of the
San Franciscans like moved out to to New York. So, I guess in a way I was like at the beginning of that migration or even like a year or two before the migration. I I mean I I lived with like my best friend and like my co colleague and ultimately my co-founder. So, that was a lot of fun. And we had like a small
group of co like co pod that would hang out. Um, and that, you know, we ended up making a lot of like good close relationships in New York, which is, I think, actually not that normal for a New York experience. You typically go to New York and you like meet a ton of people, but maybe not everyone is like you're super close with. Uh, in my case,
I got to meet like maybe five or 10 people that I ended up being very close with and are good friends to this day. But, um, yeah, the New York and San Francisco tech scenes are very different. Uh, and Justin and I ended up building a a tech startup for three and a half years in New York. And it was a much more isolating experience than it
is doing in in San Francisco. And was your YC? I guess we can go into YC, but you did remote YC. Yeah, we were I think we were the I think the first remote batch was winter
20 2020 or 2021 and then we were summer 2021. Uh, so we were fully remote and I I actually I mean that was a huge bummer to be honest like you you miss out on a lot of what's great about YC is that inerson relationship building and the energy I feel like that you get from being in the same room with all these other companies and
um the camaraderie associated with it inerson demo day which I think is super exciting um and that that's that's a huge part of the experience uh you know we are obviously part of the YC community we get access to bookface we you know make friends with with people like you. Um, but that like that's like
a pretty irreplaceable experience. Um, so unfortunately we didn't get that. Damn. Did an entire fund raise over over Zoom. Yeah. Um,
I I will say though, I think that transition, you know, the fundraising over Zoom, I don't think actually left because I still think that most investor meetings now are still fundraised over Zoom. You know, at least that was our experience. And I think it's just the scaling laws make that easier. But of course there's certain situations where
you get in person with some with with some VCs, but I think it's, you know, so I don't know if that ever went away after COVID. I think that probably started it and then it just may maybe investors are like, I can just get through way more meetings and make decisions faster. So they prefer it. It seems like Yeah, I mean it's definitely still here.
I I personally don't like to pitch over Zoom. Um, and I feel like there's uh, you know, when you were pitching, you're you're volunteering your time, you're sharing your vision, there's you're you're releasing information about like which is effectively confidential information about your company. Um, so
there's a little bit of a there's a contract there's an informal contract that you and the the investor I think are are engaging in when you guys have take that meeting. And I think that it's the odds are not really in the favor of the founder when they do that uh virtually. Um you could have like the investor using a notetaker or the
investor is just taking as many meetings as possible trying to gather information or um and it's just a lot less fun to you know you're like you're a founder who should be focusing on building the company and you like have to spend 30 minutes of the day interrupting flow to like beg for money over a video call.
like the least you can do is buy me coffee, you know, let's go to let's go to the South Park for Blue Bottle or something. Um, so I much prefer pitching in person. I think it's a good thing that it's that it's you can do it remotely because I think it opens up venture, you know, venture fundraising for people who are not based in in San Francisco or New
York. Um, that's just my take. I think the current the current batch is fundraising right now. Um I think it's
this is the time that they're fundraising and I got some like plenty of them are their whole calendars are booked from morning to like 15 hours or something just and most of these investors are probably just poking for information you know they are interested in what's going on in the YC batch and stuff. We had similar situations with all these meetings too. Not most of them. Most of them I guess for us most
of them worked out but like for the other for the other ones there were definitely some like useful useless ones in there for sure. Yeah, it's hard to say. I mean YC definitely seems to have the opinion that the best thing for founders I mean there's a whole separate question about whether you think YC has founders best
interest at heart. We'll put that to to the side. But I think that what they think is best for founders is this like fund raise during a very compressed time frame. uh create artificial scarcity, uh
book as many meetings as possible. It's a little bit of a numbers game. You need to generate hype and I mean that obviously YC companies do really well and typically raise at a premium. So there's definitely something there. Um
yeah, you know, I really want to like know and like my investors. Uh in some cases you get to know them after they've already committed and that's like the relationship building part, but some in some cases you can get to know them before. Um, and I'm working on my second company now actually and like one of the
main investors is someone that I knew for my first company. Um, so it just goes to show that like that those relationships compound. Yeah, for sure. Yes. So, keep telling us the journey.
So, you're in New York. You you did remote YC. We're caught up to what 20 now. This is
like Yeah. Summer 21. So, we we're in New York. My And and this is like in many ways
like a cautionary tale. So I don't think your your audience should if anyone's like interested in getting into YC and doing tech start which I imagine a lot of people are like hopefully this will help you uh you know you can take the positive things away and also look like steer clear of some some pitfalls. Um, in my case, like Justin and I were
working at Brex and we, you know, Justin had done a startup before that had been bootstrapped. So, he kind of already had a little entrepreneurship uh streak and I knew that I wanted to do I'd had this vision of doing a YC Venture back tech startup since, you know, since undergrad. So, we knew what he wanted to do something and like loved working together. Like we're both technical from Brex like we it makes
sense that we would be a great co-founding pair. Um, I think you guys have a similar story from from Nellifi and uh, you know, at the time like I was kicking around a couple of ideas. We were interested in fintech because Brex was obviously a fintech company. We thought we had this idea about
verticalized fintech like in the, you know, all of the neo banks had been done like Chime. Yeah. And there's like a billion of these neo banks and there was um the HR and payroll startups and then there's like embedding banking uh and there's stuff like checker like these uh sort of like fintech adjacent services by via API. So it felt like the first
wave of fintech was kind of over and there was going to be this next wave which might be like verticalized fintech SAS platforms or offerings. So you can imagine like banking for real estate or banking and expense management for uh for healthcare or for construction uh etc. So in our case my dad growing up
was an orthopedic surgeon. So I was like let's do banking slash billing something fintechy for healthcare. Like this is like my special angle is this access that I have to like I can go interview the CFO of the hospital or talk to the staff and the billing department. Um, so
that was the basic premise. Uh, we kind of like intellectualized our way into what felt like a good business idea. It was kind of a very much like an MBA case study approach to to entrepreneurship.
Um, and then the YC deadline was approaching and I think we probably saw one of these like tweets or emails about the deadline being 48 hours away and Justin and I were like, you know, screw it. Like let's let's apply and see what happens. like we we won't get in because like of course we haven't worked on this idea for very long and uh you know we'll
just like learn about the application process uh harden the idea a little bit and maybe get an interview and then we'll apply again in the fall with something more serious. Um but that's not what happened. Like we we got an invitation to interview. We did the interview like holy like we got we actually did two interviews. Um we got
invited for a second interview and I don't know if that's more common these days. Uh, and I remember at the second interview like both of Justin and I were still working at Brex and our our partner was Dalton who's now no longer with YC. He was at the end of this interview like we had gone we had prepared for it like oh okay they're going to ask all these follow-up questions about like the product and like the business and go to market blah blah blah and all was just like I just
don't believe you guys like I just don't believe you guys want to do this. We were like what? Like we had spent like a a week like hammering home like all these the all these other aspects of the the business and I I I think we basically just begged. We were like we want to do the startup. We want to do it together. We asking for your
permission to do this. Like if you admit us we will leave Brex today. And that's what happened. Um
nice. And I actually think that's like a useful lesson for a lot of people. I mean uh a lot of times the YC's of the world, the early stage startup accelerators and and and VCs just want to see conviction and commitment. They
don't they care a little bit less about like the the idea itself and more about like the fact that you are totally committed to doing this because they're really betting on you. Um yeah, I mean there are a lot of stories where the first idea isn't always the one that sticks, right? I think that's very common. You know, you can look at Obby our story as well, right? It's like
the first idea isn't always what sticks. It's I think but having conviction in working on something hard together over a long period of time and then actually trying to build something useful then you can often find that. Yeah. Yeah. It's almost they're more investing in like us. I guess they don't it's not like they knew dev
tools and like they were like oh these guys it sounds right maybe you know and then they went and it worked out. Coherence coherence of your the coherence of your story is so important. Like the story around Monster makes sense like we don't have uh language agnostic like HTTP servers.
Yeah. You know, we have express and then we have whatever the Python world has. I don't care about that. Um
like that's that's not how like it would make sense that that's not how agents plays out as well. And then like who better to build the TypeScript agent framework than like the guys who built Nellifi like Yeah. Yeah. That coherence of that story is just like so powerful. Um and I think that
investors just want to like hear that. Yeah. I think the story where we you know Gatsby to Netlfi to now is a really good uh transition for I think it it landed really well. Yeah. Yeah.
Yeah. Like narrative is such a big part of it as a founder. Like even if you're like we sometimes we have to tell stories to each other to convince us that we need to go in a certain direction you know it's all about how you know how that story comes across that then resonates with us you know I don't know like a lot of founders are really bad at telling the story maybe because it's not something that truly
can be done in two sentences you know maybe that's the premise but then the actual story is probably what people are really interested in so they don't train Yeah. What did what did Mark Twain say? He said like never let the truth get in the way of a good story. Yeah. I'm not endorsing lying to to investors
to be clear, but like a lot of times the twists and turns, you know, are are really interesting when you kind of dig into it and like they're interesting to you. It's part of your story, but like the main beats are like Gatsby, Netlfi, there needs to be a TypeScript agent framework. We're the team.
Full stop. Um and that's yeah just learning how to tell that story and learning what that story is for you I think is a a good skill for founders. So then for this medical fintech for medical thing like what was the story and did the did you pivot or you went through with it all the way?
Yeah. No, so we worked on Obkid for about three and a half years which is a long time I think longer than most u seed stage startups. I credit that to probably my relationship with my co-founder Justin. I mean, we're still
best friends to this day. We enjoyed working together. Uh we felt like we were learning at every step along the way. We felt like we were making progress. Um and so there was just we're always hopeful uh at every at every
step. And then when we decided that this was probably not the best thing for us to work on together, like we we felt like we had explored all of the idea spaces and kind of gotten a sense for whether we were the right people to be building in healthcare, whether we were excited about it and passionate about it. We felt like that wasn't true. Um we
chose together to pivot that to pivot the company to an aqua hire search and that's what led us to 11X and um has been a good launch pad. But at OPKit, we started off building medical billing software. Um, and this was preliministed, but they weren't they hadn't gone mainstream. Uh, I think the launch of of
GBT 3.5 was like November 2022 and chat GBT. Um, so we were like a full year and a half before that when we were working on this, maybe two years before that. Um, and it it made a lot of sense
like this was like very similar to kind of like the banking and expense management products that we had built at Brex like we had built card like a card swipe product and we built Brex cash their business bank product and that's very similar to when you like go to a doctor's office and you like swipe your insurance card and it tells you what your benefits are, what your copay is
going to be and then the the the doctor's offices collects that amount. Um along the way we like became subject matter experts in medical billing and electronic uh transactions within healthcare. It is a horrible horrible arcane world where you're working with like SFTP polling. They like don't even have
like HTTPI. Um yeah. Yeah. The API the APIs of like
some of those medical technology companies are so bad or they or non-existent, right? is like CSV exports or SFTP, you know, pull like you had to pull files from some SFTP server. Yeah, I' I have a little bit of experience doing that and it's just it's terrible. Yeah. So, the technology was hard. I
mean, the regulatory and compliance dimension was hard. I mean, we were HIPPA from we were HIPPA and so two type two from like demo day onwards. That was very expensive and it slows things down because every client wants to know if you're compliant. Um yeah, the the flow of funds in healthcare is
quite complicated. Like it's it's a little bit unclear like who gets paid by who a lot of times like uh the incentives are out of whack. And so like you would think that like this product would make sense, but in fact doesn't make sense because uh it uh like so this party has this incentive which makes them not want this thing to work. Um yeah.
So we got like a masters in healthcare finance or like public health over three and a half years. We built basically three different businesses. Um first was this uh this medical billing startup.
Then we pivoted to focus exclusively on electronic eligibility which is that figuring out whether you have benefits for this procedure and whether you're in or out of network which is a really difficult problem. This is this is when digital health was really exploding. So we were working with a lot of these like online tellahalth companies during co era and then in
order to make that this eligibility product work we had built up a call center of health or a a yeah healthcare call center in the Philippines. We had like 10 people in the Philippines who were calling insurance companies full-time uh and loading in the information that we couldn't get electronically. Um and
then LLM happens. At first, it's very obvious to us like we're going to be able to use LMS for for like quality assurance. So, we can say does did this agent perform the call well? Did they adhere to the script? And then we used
LMS for for data entry. We automated the part of the their role where they like would extract information from the call and like type it into our system. And we built structured outputs before structured outputs existed for this. Uh
then we uh then we built a vertical voice agent um that could actually complete parts of the call uh in in its entirety entirety. Um I think we're very early in in voice. Uh we use this kind of three-step approach that's become pretty common. The the transcription and then the inference step and then there's
the speech synthesis step. We were very early users of 11 Labs and Deep Graham. We worked closely with deep grand team to figure out things like like VAD um and endpointing. Uh FPY was was kind of
new at the time. Um and I think that like voice is like still pretty much unsolved. It's it's getting better but pretty unsolved. Um and that was the product that I think we had like the most conviction in most
promise uh and and most traction. So we had like a year left to sprint at that pivot. Uh built a pretty good product and brought it to market. uh we I think that the technology was just not quite
there when we were a little too early and maybe had spent too much of our preede round at that stage. So I felt like ironically it might have been better for us to shut down and restart and pitch this as a new company new company and a new idea. think we would have had more success, but uh we had already kind of decided that that we were not the right people to be building agents in healthcare. That wasn't a
passion of ours. So um we started to transition to to to moving to 11X. Awesome. Yeah. How was the you know, so you were at 11x for for quite a while and you know, you've recently moved to
start something new. So I don't know if you want to talk about 11x, you want to talk about what's new. We can go either direction with it, but definitely want, you know, want to hear what you Yeah, I mean the I think what's the 11x story in sort of informs like what what I'm working on now. So maybe I can tell a little bit about that. Um 11x for people who don't
know is an AI sales tech company and I worked there uh from September of last year until end of May this year. Um, so like 9 or 10 months and my team basically came in and built 11X's uh AI SDR product which is sales development representative u from scratch like Justin and I created the repo pushed the first commits all the way to doing like
50,000 emails generated by AI a day. Um, and I think I don't know like how the how you guys will feel about this but I think we are or were the largest deployment on Langraph's cloud product. Uh and in the process of this I remember thinking like I mean I the the langraph TypeScript SDK has come a long ways. Uh I remember being like this is so hard. There's going to be a TypeScript
specific agent framework. Like it's this doesn't need to be this hard. Uh and then a master came along and I think actually that's how I got connected to you guys because I I was like this is exactly what I I think should exist. Um,
and while 11x building Alice, which is the name of the SDR, I felt like I'm building a futuristic product using agents. I'm using agents to build it. Like I'm using cursor. Uh, replet was
kind of new. Um, but all of the rest of my stack is very traditional. The way that I debug issues is very much the same as it was like five, even 10 years ago. Like I'm using Data Dog. I'm getting a bug report from like someone in Slack. Maybe it has like an
organization ID or a link or a loom. A loom would be crazy. Um or a stack trace or something like that. And then I just go splunking in these like legacy observability tools sometimes for hours
to figure out like what I what actually happened. Um and it felt very like the contrast was extremely stark. Um I was kind of surprised actually because I had built I had started the observability team at B and I remember all of these technologies from when I was at Brex. I was like I remember Prometheus like I've debugged or I've I've like set up
Grafana I've I've set up data dog I I've done distributed tracing I've done instrumentation for like microservices uh architectures before like I know all of this. I built logging pipelines and id expected it to be different and it was basically the same. Um, and now like all of my time was spent on debugging. Like 10% of my time, it used to be kind of like 50/50 coding and debugging. Now
it was like 90% debugging, 10% coding because I'm so much more productive at coding. I can chip things faster, but then I'm basically as productive as I was as I've always been when it comes to fixing issues in prod. Um, and so I just became obsessed with this idea of building an agent observability platform. Uh, like something that makes
developers 10 times better at fixing production, uh, using the agents that I had gotten to build expertise in, uh, through building Alice. um and creating a platform that's like built for humans and has features that are only possible uh through agents but also uh built for agents like what does it look like when
uh when the primary user of these observability tools is not actually like saying an obby and truid but like the cursor agent um and so agent experience is a component of that too so um that's kind of leads me to where I am today I'm just in the very early stages of working on an aentic observability platform. Um,
dude, that's sick. Uh, what's it called? Are you allowed to say? Uh, not yet. Not yet.
Um, I'm holding off on it, but um it's Yeah, we're we're in stealth right now. Uh, I'm really excited about branding and marketing. I think like there there's a lot of alpha there. Um, and we're going to have a really fun and cool brand. I think it's going to feel
very relevant. Um, and I'm super excited to share it with people. I I'll probably be sharing I think we'll probably do a public launch in Q1 or Q2 of next year. Uh but there will be a closed beta uh in
in the next month or two. There's a lot of so we talked to a lot of um agent teams or AI engineering people and the things that they're wanting or at least seeing thinking that they want is like they want to get to the f they want to get to incident related things faster without a lot of like the buildup of like having to
search and read or understand stuff write run books like no one wants to write run books anymore. They think that like why do I have to write a runbook? Shouldn't the agent just escalate things? But then it's kind of hard to
pull that stuff up because the data sources are like one in legacy systems are all over the place. Like have you like how are you going to deal with that challenge I guess? Yeah. I mean there's there's kind of a bifurcation in observability AI startups
right now. And most startups are building what I what I call an AI S or an AI site reliability engineer. And an S sur is like a role that exists in big tech and and like Google I think is Google's probably famous for inventing this this role. Um and they're responsible for like owning these
observability systems and also using them to help people figure out what's breaking and to resolve and mitigate those issues. Um I think that the I that like the S sur role is uh is obsolete and is going away. The idea that like every developer or that we need certain developers who are experts in like
understanding how production systems are behaving and fixing them is kind of silly. Um certainly like as vibe coding plays out like more and more basically everyone's going to be a product engineer. Like I was a former infrastructure engineer. I don't think we need infrastructure engineers, you know, like that's like th those are
those feel like vestages of a of a like a of the cloud era which is uh you know we are in the AI era today. Um, the AI SRE companies are building these agents that sit on top of lots of different tools and kind of correlate information and pull information from all different existing systems, often legacy systems to help you understand like what might be happening in production. And that's great. Uh, it
works well like for at an organization that has a bunch of leg legacy systems that they can't rip out. But the next generation of companies is not going to like have all of these legacy systems. Like we don't need something that like spiders across a bunch of different tools. Like we just need uh like your
your your observability platform should be agentic to begin with. And so like one of the things I think that's differentiated about my vision is that we are building ingestion and storage for your telemetry. Um so we're going full stack and I think that I haven't seen any other company that that wants to do that. Um, I also think that
unlocks a lot of pretty exciting possibilities in terms of like what our agent can do relative to other agents in the same space. And uh there's even some cool things you can do like the storage uh storage and ingestion layers to to cut costs. Um last week we had some people on the show from Cedar OS and they use this word in
like a lot of their well the way they were talking to us is like this is AI native applications. Would you consider this that part of that kind of group of products that are emerging that these are built for AI and like you know designed for that? Yeah, I would I think that a AI represents enough of a paradigm shift that uh we're going to see tools
that are that are like a whole new set of tools that are AI native. Um like an example of this is cursor, right? Like sometimes like some enough of the initial assumptions change that you need you just need to restart. Uh and like what
are those assumptions like the new assumptions are like it's not just humans who are using it. It's also agents. Um there are there are features that are possible through like you can deploy LLM to make to do to to create like these chatbased interfaces or gener generative UI or non-deterministic things or do anomaly detection. um these
like pretty fundamental shifts in what's possible uh and what the conditions are for the next you I would call it the next like 10 or 20 years of of software. So I don't think that you could very easily take a company like a data dog which is built with on based on a a different set of assumptions and um and
just like suddenly magically make it AI native. I think that we're going to have new platforms. Yeah. I mean I think one of the things
is you know those other tools are going to try right they're going to add AI features to an existing product set and then at the same time there will be companies that try to be more AI focused from the beginning trying to reimagine if we didn't have those previous preconceived notions of what this platform actually should be or what this
tool should do how would we build it from the ground up and I think that you've seen many examples where you know sometimes it can work where the old players add new features but once Once you have a vast product set, it's actually kind of hard to implement new features. It's sometimes easier. It's why sometimes as engineers, we often
want to just rewrite everything because it's like, if we could just start over, we could do it way better than we could have done the first time. But you can't do that if you're a company with, you know, thousands or tens of thousands or, you know, potentially more users using your product every day. So I do think there's like this friction of it's actually hard to implement a new feature
into an existing product versus thinking of something from the ground up especially when there's paradigm shifts like what AI has has kind of done to you know all kinds of products not just observability. Yeah, I think there there basically never been a better time to start a software company because uh all of the
like everything's up in the air again. Everything's fair game. Uh it's just because the conditions have have changed so dramatically from from where we were like just a couple of years ago that like if you thought like we were late we were like in-game and everyone had sort of staked out like and won these different markets that is no longer the case. And so that alone I think is like
worth betting the next five 10 years of your career on is the fact that this is like a big enough paradigm shift that the pieces are being shuffled. Um yeah I want to be part of the shuffle. Yeah that's a good way to put it. The
deck is definitely getting reshuffled a bit and there's a lot of opportunity when that happens. The tricky thing about reimagining though and just like us because we I guess we were trying to reimagine lane chain or whatever. So maybe we had prior art but let's say for your case there's not much prior art. Isn't that I'm just
worried about like how do you get adoption on something that's so brand new in the paradigm? Do you have like any thoughts about that? Yeah, I mean I don't think that I mean there is prior art. I mean observability has existed for a long time.
um like observability as a as a concept kind of emerged in the 2010s as a response to like cloud microservices containerization like uh and then we had a whole set of tools that were kind of thought from like the ground up like you could actually argue that that was like a paradigm shift that necessitated something like data dog it's like a
cloudnative observability platform that has the big innovation there was distributed tracing uh because now all of our architectures are containerized and that's like no longer True. We we don't know build and software the on kubernetes and docker anymore. Um but uh I there were there were like a lot of there's a lot of first principles thinking that happened during that era um about like you know what are the
three pillars of observability what's the role of a of a s sur like what uh how what are how like how devops should function within an organization um and so that all like that's all exists as a reference point I think people like developers understand what observability is today and think they also probably have good reference points for like the ways in which tool chains
are changing because of AI. So in a way you can just come to someone and say hey like your ID has changed uh like the types of apps that you're building has changed um the way you build software has changed like the way you are going to deploy software and monitor in production is also changing too. Um yeah, I do think there's absolutely an
education component to it, you know, to some extent, but yes, at least there at least you're not coming up with something that people have can't imagine, right? If it's something brand new, people can't imagine it. It's maybe harder to there's even more of an education. But I think people know what observability is. There's people are
relatively, you know, in the tech world are more likely. I think this is why cursor and all those code agent tools are have massive adoption. If you're technical, you're more likely to try something new more easily. I think I
think that's pretty common. If you're more technical, it's easier for you to move between different technally, you know, technology type products. But there's certainly an education component of saying like, hey, you got to think about things a little differently because here's a little bit new of a paradigm that you haven't seen before, but this is why you should be thinking about it.
Yeah, there's a lot of creativity involved. I mean, I I think back to Cursor and VS Code, and I remember like people were pretty happy with VS Code, and I thought VS Code was sick. Like, it was a really good piece of software. Uh,
and then someone showed me cursor, and I was like, "Oh my god, I I could never go back." Um, so I think you like people may be satisfied with what they have today because they it requires a certain amount of creativity to understand what is possible using this new set of technologies. Yeah. But like if you have a little bit of vision and
taste, you can come and and show them what is possible. Uh yeah. So that's that's what we're doing. Dude, taste is everything, huh? Especially in like I think it kind of comes down to
categories. Yeah. Well, I think the collective skill level though of people is going down just from the get-go, you know, cuz they're not necessarily caring about observability until maybe they're already in production and something broke and then they, you know, they finally have to get into it. But like
even for us, like when we're, you know, we teach people about tracing, it's not like they knew, you know, and I thought everyone knew my bad assumption. I I think this idea and maybe this is is actually a a good reason is that the market is expanding right when the idea that everyone can be an engineer whether
you believe it or not today if you assume it's going to keep getting better you know it's not going to get any worse than it is today well then more and more people are going to consider themselves engineers or builders and that can actually build something well more and more of those people are going to get those applications into production and need something to actually monitor and
observe because they don't even know they need it yet. But if they do get to production, they will need it. And so there needs to be if they're AI native, right, in this case they are, then them learning data dog doesn't make any sense, right? It should be something that's in line with the experience that they've gained thus far through like the cursors and the
Sherwood's product and ours and yeah mysterious product there. I mean there are kind of different personas in a way like um I think about this a lot. I think about like how in some cases in a lot of ways like I'm building this observability tool for people like myself who have like been a professional developer for like over 10 years and uh
actually started an observability team. So like I know a lot about an observability and like I'm familiar with the product category and all of the different tools and how to instrument them. Uh but I'm also building a tool that raises the the floor for for people who are like building vibecoded apps.
you know, you're building on Replet or Vzero or Bolt or Lovable and you have your thing in production now and uh it's got users, maybe it's even got revenue, but you need to make sure that thing is running and when it breaks, you need to be able to fix it. Um, and you like I don't think that the answer is to like tell that person to to
like level up like it's a skill issue, bro, like learn data dog. I think the answer is to give them a better tool that makes observability accessible. Um, and maybe like they don't even need to know its observability and maybe that shouldn't be on my website. Um, that's definitely something I'm thinking about like how do I how do I build for for
professional developers who have a lot of experience, but also how do I build for for this like growing market uh of builders who are, you know, not really like engineers. Um, you guys have a lot of people coming to Master that like want to build agents, you know, and they like haven't built an they haven't built anything before, let alone an agent.
Yeah. And like we're teaching them AI engineering concepts which are kind of like founded on regular software engineering, you know, so you kind of have to I feel like education is probably a part of everybody's company now. It's just like a de facto thing of caring about the users's like understanding of what you do.
Yeah. Which is why you should watch the AI agents hour and also complete the new Master 101 course which Shane Shane worked very hard on. Yes. Yes. Exactly. You get it.
Yeah. No, I mean I think that you guys do this super well. Um, and other people should take note and and try to emulate it. Where can we send your sponsorship check, too? Yeah. Yeah. Banking banking details
forthcoming. We did not we did not pay for this endorsement. No, I've never I've never run in production. And uh Abby and I were talking about
doing a session where like he just teaches me MRA from MRA 101 from the ground up. Um, yeah, we we should definitely do that as a as a getting started, you know, let's go from the beginning to deploying an agent in, you know, 20 minutes or something. I think we could probably do something something like that in the future.
For sure. Yeah. So, are you happy that you're in SF now and you're have a company and you're back on the moves or back on the grind? I mean, yeah, man. I'm I'm becoming
unemployable. Like, I just love building startups. Um, I I guess like to tie things back to the to 11X and to my first company. I mean, there's so many lessons I learned from that. Um, and
probably the most important lesson was to just build something that you resonate with and like align the company to your identity and your interests and and passion. Um, healthcare like medical billing was not my passion or area of interest. I think that probably came through in my pitches. It probably came
through in my sales calls. Um it uh like I I had to go to a lot of conferences and wear wear a lot of button-down shirts which is just like not me. Um and I know other people who like thrive in that and like really obsess over the healthcare problem space and like have a network there and really enjoy it and
like this is their raise on detra and that's great but it wasn't the case for me. Um, and now I feel like I'm just building a product for people like me, a product that I wish existed, a company that I think is cool and that I'm super excited about. And uh, like just the the it's just night and day like operating when you love it versus when you're kind of lukewarm about it. Um, yeah, totally,
dude. That biggest lesson. Yeah. I mean, yeah. Yeah. Um,
what else? Like I mean is obviously like ripping right now. I think it's a really cool place to be. Um as an early stage
founder there's no better place in the world to to just be a builder as of like the entire ecosystem is designed for people like us. Uh it's kind of like you know when you're founder in New York it's like being a tall person on an airplane. Like the airplane is just not made for you. Sorry. It was made for like f people 5'8 or whatever. People like me.
Yeah. Yeah. So, so like I have a great time on airplane because they're built for me. But like Yeah. Same.
If you're too tall, you're like You know, like everyone's going going out all the time and like people work nine to fives and they have like they're really diverse interests and like lots of hobbies and uh and it's just there aren't as many investors and aren't as many other builders. and it felt really like your your lifestyle is not normalized, but in San Francisco, your
lifestyle is extremely normalized as a founder and the ecosystem is designed to work for you. So, that's really awesome. Um, there are tradeoffs. I think it was really hard to hire in San Francisco
and it's in part for the same reason. Yeah, because everyone is a founder or wants to be a founder. Um, so like even if you get like a great early team member, they're with you for like a year, they hit their equity cliff and then they raise and like start something. So I think New York has like a lot of great
engineers who don't plan on being founders. Um, and that's really uh yeah, that's really great if you're hiring. Um, it's great for you too because you know them. That's true. Yeah, I have a solid network out there. We got to get the We'll do uh AI agents in live. AI agents
hour live in New York. That'd be pretty cool. I'm down. I'll be there September 6th.
YC homies. Uh just visiting some people from our YC batch and grooving, you know, chilling. I'll be I'll be living the travel travel lifestyle. Yeah, because we have to go. Well, we're going to AI engineer Paris, so I figured if I'm getting out of the
state, I might as well do some other too on the way there. So, yes. Oh, man. Do some intense travel. You went to
Japan, right? Yeah, that was a little too intense. That was a little That one was too intense, too. The Monster community is like there's like this crazy It's so international
that I guess you have to go do it. Um, that seems super hard. But the Paris the Paris conference will be cool. Like I think just cuz it's part of the AI series of conferences. Um,
otherwise I don't think we would have gone. Right, Shane? Like Yeah, I mean it just we we have some team members over in Europe. It made a
lot of sense for us to to be there and yeah, it's kind of like a a mini monster gathering of sorts. Yeah. You guys are you guys going to talk there?
Uh, no. Just go. No. Yeah. Just flying into Creative Books.
I got weight listed, so I'll be honest about that. I just got weight listed. So, oh, probably not going to get it. It's
okay. Well, we'll call my feelings, but it's okay. Yeah, someone I don't actually know if I don't know if he's running it. I don't know. I don't know if he's uh if he and Ben are running the Paris one or if
they've got a partner out there who runs it, but some French dudes are running it. They seem pretty nice. They gave me a free ticket though, so it's the least they could do, I guess, for weight listing your boy.
Well, thanks for for coming, dude. We have to get into the news soon, but dude, when you launch, come back and we got to talk about the product and walk through, of course. Yeah, I uh I'll be super excited to share with everyone. And
like, let's do that. Let's do the master 101. We can do that off cycle. Thank you guys for having
Yeah, of course, dude. Anytime. Yeah. bring you on in a couple weeks and we'll we'll talk you through the beginnings of of MRA and you'll you'll
see it all. Let's go. All right, guys. Take care. Cool. Yeah. Good to talk.
All right. Yeah. I don't think I knew most like I didn't even know a lot of that part of the story. I've met Sherwood many times.
Yeah, I learned a lot. We got it all up. That's awesome. For those of you that are tuning in, we
did some master updates at the beginning. We talked with Richard from Napa AI and learned about some of the things that he is working on that you can check out. Wizardkit.ai was one of them. We talked with Sherwood
and learned a lot about Sherwood's story which is I think it's kind of like a cautionary but educational tale, right? like that you know it's like a continuation on you this stuff isn't easy but also like this is kind of a great time to be a founder I think and it honestly even if you're not a founder if you're just building in AI in general if you're watching this it's a great time to be an
engineer or in tech in general I think it's so I don't know I like I appreciated his like optimism towards like why he wants to be an SF and wants to build things and I think If yeah, well, no matter where you are, if you're not an SF, you are an SF, it doesn't matter. If you're building in AI, I think you're you are on to something that is likely going to lead to good things professionally. That's that's my
take. That's a fortunate thing about SF, at least for the friendships and network that we made, is like everyone's so truly passionate about what they do and we're so passionate about what we do. It's like a kind of like a nice, you know, just like a nice like being colleagues with people like that, you
know. We're also not friends with the people who aren't passionate about what they do. So, just goes to show you if you want to be friends with the monsters, you got to like what you do. Yeah. Yeah. You better you better be uh
you better be opinionated and passionate. You need a little of both. Both of those things for sure.
All right, dude. Should we get into some news? Let's do it. We're going to try to get through a lot of this. There is a lot
here. Um, I think the first thing which is was a funny article that came out last week which I think is actually a good thing for us personally and if you're reading if you're watching this or listening to this I think it's you know it should be a wakeup call in some ways and maybe I don't think anyone's
going to be surprised by it to be honest but there was an MIT report and this was in an article on Fortune that said 95% of generative AI pilots at companies are failing. So, you know, you got to pay for fortune. So, I read like a summary of the article. I didn't read the actual
article, but I think there's a lot of lessons that people if you're building an AI can learn. And the first is that this stuff is very hard and people are still figuring it out. The second is, you know, 95% of projects or pilots have maybe failed, but a lot of those pilots probably started six to 12 months ago and the tools are getting better. So, I think that that number is going to keep
decreasing. Like that's my personal, you know, optimism and bias of course, but I do think that there are things that if you want projects to succeed, you have to do and it's the things we talk about all the time. Evals, you got to spend more time on evals than you probably think, you know, like that's important. If you
don't have those, you're not your pilot's probably not going to be successful. Yeah. I think like uh Sam, you and me were chatting about this article on our like a car ride home and it really like evolved into us thinking like let's like what are all the things we're seeing out in the wild and how is it like how like
what are the people doing that's the 95% winners and what are the people that are the 5% doers are people like doing and we kind of made some like arc types right like the 95% winners sorry there's only 5% winning Right. Yeah. 95% losers, 5% winners. Of the 5% winners that we at least see, they are writing evals,
right? They're writing evals, but also their project is very scoped because they're like they're like experimenting with AI in like nice scope projects that probably the team realizes is possible. Yeah. Yes. If you're if you're building a project where you are just having to
hope that the models get better for your project to be successful, that is a project that's probably doomed to fail. You need to be realistic with what these models can actually do. And and I think one of the challenges is you get executives, and I'm I'm, you know, definitely painting with a wide brush here, a broad brush. So you get these executives that see,
you know, how someone can vibe code an app and release it into production and make all this money. And that's like someone winning the lottery. Like yes, it exists, but it's very rare. But that executive sees that and says, "We need AI. We need it to be that powerful." And
so they come up with this vision of what AI can do that isn't grounded in everyday reality, right? It's like you don't you didn't see the 999 apps that didn't make it to production. You saw the one that did. And so it's it definitely biases, I think. And so when everyone needs to
show that they're building an AI because every big company needs needs to have some kind of AI story. Yeah. They're going to set the standards that maybe are not attainable. So if you're an engineer, you need to you need to be
knowledgeable enough to show them what is attainable. And so you can set the scope at the beginning because otherwise the odds are not in your favor. And setting small scope isn't bad because you could iterate quickly and then then do the next part of the project, right? you just keep growing the scope after. But we see people that
are like, "Oh yeah, we're going to do this platform, you know, I'm just going to Yeah, randomly be we're going to do this platform. That's what we're doing." Like, "Okay, how does it work?" We don't know yet, but we're going to use AI. And
we're like, "All right." And it's going to take us and it's going to take us six to 12 months. And you're like, "Okay, if you have a pro if you have a a project that's going to take you a year to get it into production, it's not going to work."
I mean, that because everything's going to be different. you know, if you went back a year ago and you look at now, a lot of things have changed. So, yeah, in order, you know, and maybe things are slowing down like I think they maybe are, but you need to scope it in some in a way that you can get it in the hands of people sooner. And I think this is common knowledge in a lot of engineering projects, but I think it's even more
important when the technology underneath you still changing is scope something small that you can get in the hands of users and you can validate if it actually works and then build on it or or throw it out. But at least it was invest six months you invested. It's like if you told me that if you could guarantee 95% of the
projects that I the pilots I do are going to fail, well then I want to have 20 pilots because then I'll find the one that's going to succeed. And the only way to have 20 pilots is to scope small. Also, articles like this cause FOMO and stuff and FUD and kind of unfortunate, but maybe it'll wake people up, which is good. Yeah. Yeah. Yeah, I think it's it's one of those things that, you know,
generates a lot of people saying, "Oh, AI is overhyped. It's not going to h actually provide the value. This is proof." Then on the other hand, you see people, you know, like us that are already bought in that say, well, you
know, let's talk about the actual projects that are working and figure out how do we get more of those projects so that number isn't 95% and it's, you know, 50%. I also wonder like if you talked about how many technology product projects actually succeed, it's probably not that, you know, it's probably a kind of a local. It's probably not that off, huh?
Yeah. I mean, they say like 80% of startups fail in five years or something, you know, so 95%, you know, makes sense. Yeah. But then usually when you say the word pilot, it's like someone sold something to a big company.
At least that's how it feels to me. Yeah. Um but yeah.
All right. Well, let's continue on. Uh, we had a lot to go a lot to go through. So, let's talk about Anthropic. So,
first, Anthropic's in talks to raise apparently 10 billion in new funding. So, they're raising some money. So, that's a lot. They've also released new admin controls for business plans,
which basically kind of combines the coding agent and, you know, the app in one subscription. I think they're still trying to figure out how to bill for cloud code. I think claude they realized maybe they didn't expect cloud code to be as popular and now they're trying to figure out how do we make cla code get
kind of pulled into the rest of the product. So I think that's what this is mostly about. You know it's probably a good move. I think ultimately people want free cloud code. They can't give it
free forever. So they need to try to figure out how do they bundle it. And this is just bundling.
Another packaging issue. Pricing and packaging. Pricing and packaging. Uh so there is a
some updates from OpenAI. So we'll shift over and talk about OpenAI. They had updates to the responses API. This was last week. They have
connectors. So in within the API, you can actually pull contacts from Gmail, Google calendar, Dropbox through, you know, through the API. Not just in chat GBT, but actually through the API itself, which is pretty interesting.
And on top of that, they have conversations which is basically like memory built into your API call. So you don't have to own the database. It's just, you know, I'm assuming through however you authenticate that API call.
So I thought this was I don't know pretty interesting. What do you think? Uh it's pretty interesting because OpenAI wants to take over the world. They're going to do it one thing at a time. Yeah. I mean it's it is just they're
trying to expand the influence of what what they can do, right? So this is obviously competing with memory providers. It's competing with people that are doing integrations and MCP and and all that.
I I do think you know it's one of those that it honestly I I just feel more and more like Apple's the walled garden. It's or sorry Open AI I even messed up the name. Open AI is Apple which is like the walled garden. Like Open AI is trying to be Apple. Like you know you
just have everything which which is cool. I just think that, you know, it's also kind of locks you into doing things in a very certain way. Like I wonder what connectors connectors are probably their own integration pattern than MCP. So, but then it's built into the API which is like a really nice touch, right? Because you
get those tools for free. Yeah. Um Yeah. Because you don't have to worry about you don't have to worry about how
the connector was used as long as they support the connector. Yeah. And maintain conversation like they already did. I felt like they already had conversation persistence, right? But now there's API around it. So that's
interesting. There'll probably be memory primitives and everything like that. They're trying to drink everyone's milkshake. Yeah. And they are apparently arranging a secondary stock sale of $6 billion
according to, you know, some sources. That would put the company's valuation at 500 billion. So they're, you know, their employees about to get paid potentially. They're about to get paid. That's cool.
Uh, this one, you know, let's let's keep talking about OpenAI. So, OpenAI released agents.mmd or started promoting it at least. The first time I saw it was last week. And this is a real problem, but I kind
of laughed that I don't know if this is the actual solution or not because I've seen 10 different people trying to solve this, but this one actually has the backing of OpenAI. It's basically like a read me for agents. The goal is so you don't have to have like a clawed file and you know cursor rules and all these different uh files in your codebase. You
can have a single one and then all the agent should adopt this. Yeah, agents MD I remember we saw it first in codeex um when we were playing with codecs back in the day. Um, but dude, then this is yet yet another markdown standard. I kind of feel like my hot take is first
of all, I don't know if it's going to work. It might actually work. They actually have some credibility. So, they might be able to convince enough of the
coding agents to support it. And so maybe they can actually have they're probably one of the only few that if it's going to work, it's going to have to be a big player. But I feel like they were just jealous that Enthropic got MCP and they're like, "We got to get something." Google Google
has ADA which you can say what you want about that. Enthropics MCP we need agents.mmd we need something we need some kind of standard or spec that we can own besides of of course like the actual you know completions API and which you know not everyone uh supports but a lot of people do. when we were meeting with or chatting with the OpenAI folks on Codeex, they
they did mention that everyone internally or I don't want to say everyone, but they heavily use agents MD with codeex internally and it does and they had like a lot of good things to say. So, but then like I just this is just funny because like last week there was like open source project that allows you to like uh translate and transform
from one format to all the formats. So you can like have like one format to rule them all which will work in each thing and it's just like yeah this is yet another one. That's cool. Yeah. How does how does the other one work? Does it just like compile? Like you just commit the one file and then
it's like a when you clone it's like a post clone it just creates the other files for you and keeps it up to date or something. Yeah, it's pretty much like uh Yeah, you pretty much got it. Okay. Well, I just I just architectured that thing. I don't know. Yeah.
Uh, and this is one that's not really news, but I saw it and I thought it was kind of cool. So, maybe if you're looking for you want to see how AI is progressing, you can go to progress.openai.com.
And the thing that's kind of nice about it is it gives you a a bunch of different prompts you can click through and it tells you what answer you would get between GPT1, GPT2, text Dainci, which if you know you were if you know what if you don't know what text Dainci is, you haven't been building in this very long. So yeah, that's when I first that this is when I
joined like when I was first hitting the API. Um but then you you can basically see all the way through you know today GPT5. So you can see different prompts and like how each model would respond which is uh kind of interesting because you can actually see quite a bit of progress in not really that much time if
you think about it. Yeah, seven years feels like a long time but pretty significant progress has has been attained. Obviously a lot more since like 2021 to to 2025 was really the biggest.
So play around with it. It's kind of cool. It's very cool.
All right, let's we're done with OpenAI. So, this one we can just casually mention because I don't know much about it, but apparently MidJourney and Meta have a partnership or they're at least in talks where they're going to try to license some of Mid Journey's tech for future Meta projects. So I think for midjourney I think it's kind of interesting too
because midjourney was very hot for quite a while and I think a lot of what they did you know maybe maybe because the interface was through discord for the longest time but a lot of what they did seemed to get kind of just eaten up by the rest of the models which happens to a lot of companies we had already joked about it right
the model companies are probably coming for a lot of ideas around what they already do and image generation made sense but I do think that Midjourney does have kind of some distinct stylistic elements and you know maybe Meta has just decided that they don't want to they can't win in image gen and you know and all the other
things and so it's better just to license someone that already has a pretty good product and just use that. Yeah. just yet another like you know Facebook doing the restructuring and stuff like this just seems like another negative towards the goal of restructuring just more to deal with you know but hey we'll see I think it's probably not going to be anything big
yeah we will see so this one this this one I have strong feelings about there's an article that says Apple is in talks to use Google's Gemini AI to power a revamped Siri. Okay. So, is that so one is Apple throwing in the towel on AI? Two, like how do how does Apple feel
about like Google being the, you know, the AI in their device and, you know, probably the AI on a lot of, you know, Android devices, right? So, I don't know. What do you think about that? I think Apple, Man, I wish I knew what's going on under the hood there. Cuz like
Siri sucks. Like it's probably the worst. Like talking to Chad GBT voice is way better than Siri. Except chbd can't do any of my Apple actions on my
iPhone which seems like a big when you're a developer like one of like you know you're all developers use Macs and Apple products yet we don't have this one feature that we get in our actual app like the OpenAI app. It is weird. And another thing is I thought they had an open AI partnership. Where'd that go? That sucks, too. So maybe if Gemini will be
better, I guess that's good. But it doesn't seem on brand for Apple. So I don't know. It's kind of weird.
Yeah. I mean, but then again, I feel like maybe, you know, because Google is the default search provider on iPhones, right? You use Is it?
Yeah. And I Safari. No. Uh well the actual for search I'm pretty sure it's it goes through Google
and oh the actual like uh the actual like thing. Yeah. Yeah. Google pays Apple I don't know what it
is. Someone you know in the chat if you know it's like billions of dollars a year. It's a significant amount of money to be the default search provider on all iPhone devices because Google wants to own search. So they pay for that and they know they make a lot of money from the users that are using their search.
So they'll pay Apple a lot of money. I wonder if Google is kind of doing the same thing here where, you know, Apple's like, "Well, we can't win an AI, but at least we can make a lot of money on the device distribution." And so someone that is winning an AI can just pay us to be the default AI provider for Siri, you know, then Siri is just like the
name and it's like uses Gemini like Siri powered by Gemini or whatever, you know, and then if they want more money, it can eventually be Siri powered by OpenAI or ChatGBT or whatever they you know. I don't know. Dang. So, like I as a user I don't even
care if it's like Siri or not. I just want it to work. Yeah. I think that's a lot of people. That's probably why Apple's probably
saying like maybe they're still working on it, but they're throwing the towel. Yeah. Yeah. Justin says, and I Yeah, I didn't know the amount. I don't know. I I think it's way more than a billion, but I
don't know. But yeah, pay like one bill to be the default search engine. And then I agree. Siri is bad. So bad.
Make it better and do what you have to do. because clearly you you haven't made it better yet. I mean, Apple intelligence is that was a joke. That was a marketing. This is marketing, right? There's nothing intelligent about it to be honest.
No, it's pretty bad. Pretty bad. All right. Uh, but speaking of Apple,
XAI, and we talked about this before, but XAI is actually suing Apple apparently, and they're using a Elon is accusing them of colluding to suppress competition in AI apps. So, they apparently did just announce this lawsuit today. So, this is like fresh news, but we talked about it, I think, last week. And then, you know, between Elon and Sam Alman where Sam Alman said
it was a skill issue and and all that stuff. But, you know, XAI must think they have, you know, a case here. So, we will see. But, I'm sure it'll I'm sure it'll play
out in courts and we may never hear about it again or it will take a very long time, but it's obviously happening. So, another interesting XAI related post. So, XAI opens Grock 2.
The weights are on hugging face now. So, Grock 2.5 and there was uh Elon Musk did say that Gro 3, you know, in quotes will be open source in about six months. I don't know if I believe the time frame, but
like a year. Yeah. Yeah. Double it. Is that that's my engineering management trick. Whatever
timeline you get, at least double it. When was the Cyber Truck supposed to come out? Let's just use that as a proxy.
Okay. It's like it's like if I ask a normal engineer, it's like I double their estimate. If I ask Elon Musk at a triple or quadruple, so maybe 18 months. Um
but anyways, so it's still I think it is cool though that more open openweight models are coming out. That's a good thing especially if you're you know if you want to run some of these open models yourself. If you want to yeah compare and contrast and be able to run your own inference. I think it's good. I
got I'm a I'm a fan of more open models. Yeah, me too. Even if I don't use them that much, I just want to know I almost just want to know they're there. It's like gives me comfort that I could run this stuff if I had to. If I really needed to cost save,
I wanted to do fine-tuning. I haven't really done it a lot. I imagine most people aren't, but I do think that, you know, people want to know that there's like open competition and the open models are still relatively competitive.
Yeah. And that also inspires people building an open source. you know, they can look at, you know, they can look at the code for inspiration and, you know, try to do their own thing. Yeah. And speaking of open models,
Deepseek 3.1 was released. So, that's pretty. Yeah. And it says, you know, it's our first step toward the agent era.
So, it has one model, two modes, think and non-think. So that's kind of like GBT5. Yeah. You know, think reaches answers in less time than Deepseek R1. Stronger agent
skills. So it's better at multi-step agent tasks. And then you can try it out on, you know, chat.deseek.com or obviously, you know, you should be able to run it yourself. I imagine
probably in LM Studio. So, like the agent era though, what does that even mean? You know, I don't know. Is it is this like a is this a release that's
like one of those releases that is building upon something for the future? Maybe. I guess that's what agent the agent arrow is coming. Um, yeah. Well, well, Tyler, you know, Tyler
from the master team had speculated that it might be a precursor to like their new R2 architecture. So they based on some of the reception around it was you know it was kind of mixed but that the architectures maybe a little bit different so it might be leading to what they're trying out or what they're moving towards and so they kind of merge
their chat and reasoning models into one model and so it's like kind of like the first version which is similar to what GBT5 did right it's like you you just have one and ideally the the model chooses if it needs to think deeply or not I think that you know that's probably what people want with the idea that they can
tune some knobs or dials if they need to, but overall they wanted just to just to figure it out. Yeah, maybe the and maybe this is true or not, but like maybe reasoning models themselves were just like a point in time, you know, and then that's more of like going to be folded into model architecture because like like do people use deepseek for regular chat or only R1
for reasoning like if you're using it? Well, I mean I think from what I've heard most people used it because it was a good reasoning model, right? That's that's I would imagine what most their usage is. But I think this is actually a trend
in that ideally you don't need to make these decisions because I I would say the same thing could be true of like even like reranking models. Everyone was like really big on reranking models if you're doing rag and I think I still think they're valuable today but also a lot of people are just using just general LLMs
to do reranking. You don't even need a specialized model. And so if you think about it, why can't the, you know, the one API decide, okay, you want a reranking task, use a specialized reranking model and it just routes for you rather than you having to specify. And same thing with thinking, you know, you reasoning or not, shouldn't the LLM
decide? If I ask you to build a research paper, well, you probably need to do some pretty deep thinking on that. If I ask you what 2 plus two is, you don't need to think very long about that. Yeah, and it's kind of like Yeah, it's
kind of like an agent network, but like the model itself is like uh routing within it. So that's probably that's probably how the architecture will change, especially after GPD5. It probably be more like that mix model architecture. We should probably with that being said, we should probably get Professor Andy
back on the show and probably teach us what that even means. So that's Yeah, we Yeah, we act like we're the experts. Uh we're figuring this stuff out, too. But Andy knows. He'll teach us. Andy knows.
Yes, we should bring Andy on. In additional uh open source news, so Quen imageedit is the new open weights leader in image editing. So apparently it's the quality is comparable to GPT 40 for image editing tasks. And so this is
kind of, you know, it's Apache 2.0 license. The weights are available in hugging face. And I've seen it, you
know, kind of compared to Flux.1 context as well and seems to be better than that. But it's good to have more open models. Show some examples of, you know,
here's the original added design to the mug. As you can see, that's pretty cool that you can have it, you know, see the different uh versions of Quen versus Flux versus GPT40, which of course these are like cherrypicked examples, right? But it's just Yeah. cool that you could do this.
Like look at this. Like why do why do they like look at this one? What's going on here?
And why does why did she go from like relatively like neutral to sad in all these images? Maybe GP34 is a default depressing. Yeah, I guess it asked turn.
Okay, now now I feel better about it. So, it changed the facial expression and changed the instrument, which is So, those are obviously better than that. Yeah. Yeah,
that's dope. Change a strawberry into a blueberry, just turns it blue. That's funny.
Anyways, uh image editing is fun, but it's good to have more open models that can do that. All right, now we got a whole bunch of other news. So, this one is from Klene. And for those of you if you're looking
for a good article to read that just kind of talks about some general lessons on you know context windows and building again we mentioned previously that last week we we had a context engineering meetup at what the YC office where our co-founder Sam was talking but I think more and more when I've been talking to
people the a lot of people are like well the context windows are big enough does that mean we don't need rag we don't need any of these other you know tools And the answer still is no. Like bigger context windows are good for certain things, but they don't necessarily solve all your problems. You can't just shove a bunch of nonsense into the context and
hope for the best. So I think that's why context engineering in general has kind of become a hot topic. But uh Klein has this thing called focus chain which they talk through, you know, a little bit more details on how even though you have bigger context windows, you have some of the same memory issues
and some of the same context drift and context rot and all these terms that are popping up. But essentially, you know, there's a new feature that tries to tackle this on. It's a context forward approach to task orientation. So, it's kind of like a seems like almost like a task manager of sort, like a planning
tool, but it's a useful article if you're just looking to dig into a little bit more on how client is approaching this problem and maybe if you're going through similar problems, you might have ideas on how to improve your agents as well. feels like context is like this like um area where like like the message history and the messages and the content and everything is actually not it's like separate from
the context and you're injecting into it. So even if you had a million like a million token count window like it doesn't mean that you you should use all a million tokens to achieve a task, right? you you still would want to be like cognizant of what the prompt is and what the context given to the prompt is and then is that enough? Where do you think the like do you think that like
what's the use case for like what would someone do with the a million context window like on the on day one? Are they going back into rag and removing some things or like what do you think? So like why would people actually use a million context window? Yeah, I'm trying to wonder why. I think there are probably so all the
models companies what what they all benchmark against is that you can basically do needle in the haystack. So you can fill up a million token context and you can say what was this one specific thing that was mentioned in that context and in that it's kind of like a you know a reax of of sorts like it could all the models now can do really good on large context like picking out when did I ask you what
was the date of my anniversary well if I mention that exactly one time it can go through and it can pick that out right like that's that's something that someone could that the models do relatively well There's way more to that. So if you think about like how to piece multiple pieces together from that context in
interesting ways now that has to reason across all that context and make those those uh inferences and kind of those connections where if you could just give it the you know all the times I mentioned something similar to that and compress that context now the model has a lot easier time like connecting those things. Yeah. So I think so I don't know
I don't think anyone's I think a lot of people will shove like huge context and just see what the models can do but unless you're doing a very simple task like that filling up the context window I don't think it's going to work very well. Yeah, because like coding agents are the ones that I would say you could easily fill that window up, right, with files because like you know some people
got big ass files and you know that it's kind of sad that even with the context window you still need some more paradigms for you operating on a large codebase etc. Yeah. Well, all the coding agents now basically just use like a a GP like tool. GP. Yeah. Yeah. So there's no re, you know, I
think filling the context up with a lot of code doesn't seem to be the right pattern. At least not that's not the pattern that's working today. Yeah, maybe they're like well you can see like when cursor is like thinking it it's like it's traversing of many paths in the codebase and then the summary of
those is the context window that it'll then use for what I asked it to do. um which is very like yeah it's intelligent in terms of it's probably is doing some retrieval on what's happened in the in the thread and all that type of stuff. So pretty much everything we do at Monster actually so good for us.
So continuing on Coher has released command a reasoning which is enterprisegrade control for AI agents. It's essentially a model that you can deploy and use. It's just interesting that you know Coher has a bunch of like reranking models and such but this is just a general reasoning model. So haven't used it, don't know much about it, but it obviously has some benchmarks against
some other deep research specifically. You know, Coher is very enterprise focused. So they're trying to say this is a better reasoning model specifically for enterprise use cases.
Yeah, this is weird. It's like opposite of what we're seeing in the other one where they're making specific reasoning model versus like undoing that. That's interesting. Yeah. I mean, we we'll see what ends up
winning out. Yeah. So, this one is an interesting one because I think it just it's like a precursor to what I think we're going to see more of. So, I'm not even that
interested specifically to I'm never going to try this out, right? But deal launches an AI workforce. But I do think that more of these companies are going to launch tools to basically build your own helpers or agents in their platform. Right? So this essentially allows you to turn on or build AI agents around I'm
assuming your specific needs. So you want you need help hiring or need help managing time off. I mean these names I think are kind of ridiculous. Yeah. Time buddy. Border Buddy, Schedule Sheriff, you know, Goodbye Genie. I
mean, they're just trying to play off some, I guess, cutesy name. Cringe. Yeah, that's super cringe.
But it has like specialized AI agents that you can essentially turn on or, you know, quoteunquote hire for your team, right? And I imagine you pay them based on tasks or or whatever. But I do think we'll see more of this kind of thing. Maybe, you know, maybe with some better naming conventions. But I do think that
yeah, like this is just a precursor to what I think we're gonna see. So deal is kind of early in this. Will will deal have a AI for being a spy? Maybe. Like you think they'll have one for that? I don't know. I I don't see
the slick spy. Slick spy. The company with the spies, right? I was wondering. Yeah, I mean I Yeah, I I think I remember hearing that. Yeah.
So I don't know. They that that one didn't they probably that one was on the list. It didn't make the cut. They like
seven. That was number eight. It didn't quite uh didn't quite make it.
I do think uh HR companies um will all follow suit too if that thing takes off. Even Salesforce is going to have like these specialized agent products. It's interesting. It's interesting that they
like uh verbiage is like hire them. Makes it more human. Well, it makes it more I think if they convince you that you're hiring them or that you're you're like you're rather than just like a feature you turn on, it makes it feel like you're you're comparing that to if I had to hire a person for that same
job, which we know that that agent isn't going to completely solve the same thing, right? But it'll solve maybe a subset of that person's task. So, if you could say like, I normally would hire three people, but I could actually hire two and give them this tool and it makes them more efficient. Well, then I save myself one whole
person. So I, you know, I can compare it to that salary because now my team can be significantly more efficient with less resources. I wonder if they all have like personas, too. Like the payroll detective is like
an Yeah. It's like it, you know, it's like you have like the financial ones are going to have like some auditor, which the auditors are going to be a pain in your ass. Pain in the ass, you know, just checking like the the the pedanticness of everything.
Yeah. like this is, you know, you did not fill out this expense report correctly. Yeah.
Yeah. I don't know. Uh I I imagine that they will just because they'll try to make it more Yeah. HR friendly. Yeah. They want to make it more HR friendly like an agent. I honestly think
that over time maybe they won't and they'll go back to just like providing the tools for the person that's doing the job. I think it's actually like it's like marketing to make it more feel like a person. right now a lot of tools like you know we just had Sherwood on right like that they have the Alice which is like an SDR I think that giving it a persona is going to be useful for a point in time
but my hunch would maybe say that long term it's like you're just going to roll these things into into like one global agent that you maybe has some kind of persona or just like they're just become features of a product that you don't need specific persona for. But I don't know. We'll see. Yeah. Hopefully no one falls in love with the border buddy or whatever. you know.
Yeah. And then they upgrade the border buddies model and it's like, "Oh, no. You've changed.
You're not my boyfriend anymore." Yeah. Yeah. Why have you changed?
Something caused you to change. Yeah. I mean, we joke, but that's it's a real concern. But the show must go on. This one is not
was not recently released, but I thought it was kind of cool. So, Hugging Face has introduced AI sheets. This was a while back, but we never talked about it yet. So, it's I I think I've just been seeing more and more people trying to
figure out ways to curate data. So, this is trying to provide you tools to do better data labeling, data curation with using open models as like helpers for that. So, that that's kind of cool. I think more and more people it's going to be the quality of your data and your
data sets. We talk about the eval loop and this you know may be a tool that fits into your eval loop in some way. The data set problem is solved in all these like LLM ops companies and stuff but even Haml's like you just need like a simple tool and this is pretty cool.
Plus you can export it too which we've done. Yeah. I think it just provides you kind of a a a way to start to interact and kind of curate the data and it provides you little tools to make that a little bit better. You don't need it of course. I think you could build it yourself.
I think you know MRA has is going to have helpers to help you do that. A lot of the other observability providers will have helpers but it's just another you know potential tool for the you know the tool belt. Yeah. cuz like big eval wants you to use
the whole platform, right? And to get like a data set feature that's like a spreadsheet essentially, you have to have everything else too and then it all wired up, but like obviously hugging face is like the MPM kind of of this world, right? It's like very like they have swag and so this Yeah. And it's meant to feel more open,
right? So it feels like you're not locked in to a certain tool provider. It's works with open models. So I think
that's the the idea is it's just a little bit more of an open tool. All right. And this last one is just an interesting article, interesting thing that we can talk about to end on today, which is there was an article in the new stack. And I think we've talked about things like this before, but
there's the statement of AI agents are the new APIs. So, it's basically the idea. I I just think, you know, the article is interesting, of course, but I think that's the the headline, right? is like are AI agents the new APIs?
Rather than, you know, having co your code talk to specifically an API, are you going to have, you know, your agent talk to another agent rather than an API in the middle? So, oh, that's interesting. So, like there are no tool calls. They're just other agents who know how to do that or
someone who owns that tool or whatever. Yeah, I don't know about that. Yeah, I don't know. I mean, I think it's interesting. I think there's the idea that, you know, you could argue, is MCP
the new API? Maybe. But if you put a layer on front on top of MCP, are agents going to be the new API? I don't know. Not today. But yeah. No, that's interesting.
Yeah. I've yet to see anyone replace an API with an agent, though. Yet. I think it cost too much money. Yeah, I mean in the grand scheme of
things, yeah, you have to kind of think about what's the use case where you'd actually think that would work, right? If it's deterministic, why do you need an agent? But yeah, I could see there being endpoints that are, you know, a little bit more like a natural I think I, you know, I had a blog post at one point that
API should have a natural language endpoint. So, you know, it knows what you have access to. you can actually just ask it to do the things that you would normally use the API for and then you don't have to, you know, if you're integrating, if you want to integrate your app into that API, you just know that there's a natural language endpoint and it can do most of
the things pretty well, like assuming that the model that's behind that endpoint is is good. Now, your app can just kind of like ask questions and it will not have to inspect the API or make, you know, deterministic routes for how to handle those different things. So it leaves it up to the API to decide. And I think MCP then came out and helped
standardize. Well, no, like the MCP is kind of the replacement for the API and your agent can just consume that and know how to work with it. So it's kind of solved some of the same things.
But yeah, I don't I don't know that there's like a a really good like a agent that's replaced an API that I'm that I'm aware of. Yeah, at least not yet because mostly are people are still building like co-pilots and chat based things where I mean I could see a definite future where you call it an API but under the hood it just does a bunch of like agentic execution over like a
thousand things and then you get it's done right and that's still an API though even though there's an agent so I guess it's true agents are APIs and APIs have agents yeah if you deploy a master project every agent has an API so I mean Yeah, that's true because we already think this way. So, yeah, I mean, I think it's just kind of yes, it's supported. I don't think it's going
to replace existing APIs. I think your agents that you build may interact with those existing APIs potentially through MCP or potentially just consuming the API through tool calls, right? Yeah. But all right, dude. That was a show. That was a show, dude. Oh, that was a
lot of news and you know, lots lots of things were happening which is great. Yeah, had had some good guests. Yeah, great guests. Fun times. Yeah, we got we got a starting to cook
up a pretty good show for next week. So, I'm excited about that. Oh, for those of you watching, we're not going to be doing it on Monday next week. We're
we're here Monday noon almost every week. It's it's a holiday. We decided we're going to, you know, do the show on Tuesday. So, you'll still see us. It'll be Tuesday next week.
So, just, you know, if you're sad that you didn't get to see us on Monday afternoon, just know you only have to wait 24 hours and you'll Exactly. We'll still be in your, you know, on your computer screen or in your, you know, earbuds or whatever, whatever you're listening to. And that reminds me, if you're watching this on YouTube, please subscribe. If you are not watching it on YouTube, if you're
watching on Spotify, if you're listening on Apple Podcast, if you're watching on X, please uh give us a rating, assuming it's five stars. We appreciate that. and you know like, subscribe, do all the things that everyone always says that we have to, you know, we have an obligation to say to you. So, please help us out. Please uh
yeah, please share the show. We're trying to get big on YouTube. We're publishing great content there.
So, your subscribe is not only for this show, but for all other educational things that we do. So, please, you know, do that whole thing and check out some of those videos. Alex has been on fire, killing it, killing it. Even that heads up game thing I saw that was pretty
tight. That was so much good content. We're trying to get better at it, too. So, yeah. Yeah, check it out.
Ultimately, you know, we are all the admirals of AI, right? We're sure trying to trying to teach people what we're learning along the way. And so, hopefully you can get some value from that. Yeah. What else should we plug before we
go? We have Typescript conference. Oh, I guess TS tsmp.ai. We should let's
drop that in the We need a TypeScript AI conference. Yes. So, go to tsmp.ai. Uh we're not we're not selling. So, you
can't actually register yet. just put in your email so you get notified when the registration happens because I I pretty sure and you know this is public but don't quote me on this everybody but I'm pretty sure the way we're going to do it is there's a set number of early bird tickets and so if you are on the list you're going to get an email when it
goes live you can get one of the cheaper early bird tickets and once those first set of tickets are gone then it goes to the full price and we only have 300 spots and we do expect it to sell out like we Yeah, we may maybe have a room for a little bit of overflow. We'll see. We're trying to figure that out. But 300 spots in SF, you can obviously even if you're
not in SF, you should go put your email in anyways because you're going to then get all the information about how to watch it virtually. So even if you can't come to SF, drop in your email. You'll get notified of how you can watch it virtually and attend online. And then if you can get to SF, you know, come hang with us.
Yeah. Yeah. All right. What else do we have to pro uh promote?
I don't know. You want the book? Get the book. Yeah, that's it.
Get the book. You know, give give Mstrostar on GitHub if you haven't already. You know, please do that.
16.1K now. We're on the way to 20. Road to 20 begins.
Let's do it. Give us some stars. Help us out. And yeah.
All right, dude. Let's call that a show. That was a show. Let's call it a show. Yeah. Thanks everyone. See you.