Mega Guest Episode! Databricks, CodeRabbit, MongoDB & Osmosis
Today we have 4 guests joining in! We have Andre from Databricks/Neon, Erik from CodeRabbit, Gaurab from MongoDB, and Professor Andy from Osmosis. We also highlight some of the recent AI news from the last few days. This includes model provider updates, Vercel workflows, and even some AI music.
Guests in this episode
Watch on
Episode Transcript
Hey. Hey. Hey.
Heat. Heat. Hey.
Hey. Hey. Ra.
Da. Heat. Hey, heat. Hey, heat.
Hello everyone and welcome to AI Agents Hour. I'm Shane. I'm here with Obby. Obby, you got you got a new hat on today. Yeah, dude. I'm shilling something else
today. Always. Always. It's good to see you. Um, for those of you that are not familiar, AI Agents Hour, we meet almost
every Monday and we talk about AI news. We bring on some fun guests and learn what they're doing in the world of AI and we talk with you. So, this is live. Please use the comment feature whether
you're on YouTube or X or LinkedIn or wherever else you might be. And if you are listening to this after the fact, please give us a fivestar review on Spotify or Apple Podcasts or wherever you're listening to it at. We do appreciate only the five star reviews. Uh otherwise, please refrain from from
leaving a review. Yeah, fine. But yeah, what's up, dude? Good. It's good. It's busy. Last week
was super busy with all the conferences and stuff. And this week is not as busy. So, that's good.
Yeah. Time to time to get things done before our conference next week only to get busy again. One week off to make moves. Uh yeah, for those that are not aware, TSAI
comp is next week. So, if you have not got your spot yet, tsmp.ai is the URL.
We have that somewhere right here. It's on the screen. You can register and attend virtually for free. So, no reason not to do that. And check out some of the amazing speakers we have.
And if you're in SF, there's definitely always things in and around events. So, plenty for you to do as well. Lots of lots of afterparties and pre-parties and all things in in between. All kinds of Yeah. Uh, but we could probably just get
right into it because we have a packed episode today. We have four guests. I think that's a new record.
New record for sure. I think we've had three before, but four today. So, that's that's awesome. But
maybe we can start with the the first bit of AI news and then we'll bring on, you know, Andre as well. Yeah, let's do it. Let's kick us off. What's the first thing on the docket? Um, let's go through like the model news
because I'll just bang those out real quick and then more stuff later. Yeah, as as always there's a model news, right? There's always model news from model provider. So, Miniax M2 was announced
and it's, you know, says an advanced model that it achieves global top five status surpassing Claude Opus and reranking or ranking just behind Sonnet 4.5. So coming today, you know, so it should be out. I didn't know if I saw the actual announcement tweet, but it should be
out. If it's not out now, it will be soon, I'm sure. So that that is interesting. Anytime there's a new model
that especially is good around, you know, writing code, which this seems like it is. And the bold claim that it surpasses Opus, you know, is super interesting. Uh Miniax is a cool company. They're like based in like Shanghai. So once again, another Chinese model coming at coming
at you. So this is cool. I want to play around with this. I I think they said it was free for a
couple weeks as well and maybe it's going to it's going to be open sourced. So I heard a couple different things. So need to validate all that.
Also, Mistl AI Studio was announced. Mist's kind of been, you know, somewhat quiet, I think, for a while because they're boring as I was gonna say it, but uh but they're trying to get back in the game. Yeah. They want you to know that they're still there and they're building basically a studio to
build agents and evaluate those agents. Looks like fine-tune them. I'm not sure if it just uses Mistro models or if it allows other models. I haven't dug deeply into this. I just know it came
out and some people were talking about it, so we wanted to bring it up here. Yeah, this is cool. I mean, this is kind of where I think a lot of model companies are going to go, right? to have like platforms, product platforms.
Um, yeah. So, we'll see. And last but not least for today, uh, there's a open source code agent from Kimmy Moonshot coming. Doesn't say
when, but it just says it's coming. So, an open- source code agent, which which I think is good. I like to see different open source agents because it allows us to figure out one like how would you build this yourself gives you more information around just how coding agents work. So it's always good to see
uh some of the model providers open sourcing some of the stuff. Yeah. And Moonshot is from Beijing.
Chinese people coming after you. Yeah. All all the Chinese companies dropping new models. I mean, in one part there's just so many of them. It feels like there's a new one every day.
There's just a ton of Yeah, we need to do the breakdown. Yeah, I think we need like a a map of maybe should be like a geographical map of like where in the world these models are coming from and what models are there because and then there's different versions like this one's a coding agent, right? Which is just probably built on
top of of their models. But I do think it's hard even for me and I spend a decent amount of time looking at this stuff, keeping track of all the different options, especially if you're just, you know, doing the happy path of trying the big the big model providers and you haven't actually experimented with, you know, Quen or any of these
other potential models that are actually pretty good. Some of them are work really well. I mean, we mentioned last week, you know, Airbnb is using Quen pretty heavily in Airbnb's agent.
So, I think it's worth experimenting. All right. Should we should we just move right on to the guest and talk about the the next AI news? We'll we'll do some we'll sprinkle in some AI news between guests here today, but we got the model
updates. Those are some of the big ones. We have a few others we'll talk about as well, but I think it's time now, drum roll, to bring out our first guest. So, we have Andre from
I guess Neon. Hi. Yeah, data bricks. But it's great great to have you on the show. Uh yeah,
glad you're here. Yeah, thanks for having me, guys. I'm I'm already laughing because I have the same trouble. I'm always saying
I'm Andre from Data Bricks, focusing on Neon or like Neon Data Bricks. Yeah, one of those. All of them work. Both. Best of both worlds maybe. Uh yeah. Can you give us just the short uh
TLDDR? Who's Andre and and what are you doing here? Yeah, for sure. Um I'm a developer
advocate. I joined Neon in January and then a few months ago we have been acquired by data bricks. So now I work at data bricks but I'm still a deaf focusing on neon. And uh Abby knows me.
Um I hang out in SF. Uh we have been hanging out at different events uh a lot in the last few weeks, but especially last week as you guys mentioned already. It was pretty crazy next year comp and all the other events. Um but yeah, um
based in the Bay Area, originally from Germany, so that's the accent you're hearing. And then I moved here four years ago. Nice, dude. Gen.
So like uh since like I guess how crazy was the acquisition having been there so like just guess there. Yeah, I think it went like super smooth. I was surprised how fast everything went. Like I was expecting this to be like a I don't know monthsl long thing before anything really really like
really happens. But yeah, I I feel like a data bricks employee already. Um we all like fully moved over new slack and everything. But it was smooth like there was almost no interruption to our day-to-day and from the beginning the
the communication was basically like ship like keep shipping don't don't slow down. So um Neon team has been doing exactly that and has been a lot of fun and neon state is like a product itself right it's not like it's running by like its own product that's totally yeah we still have to like figure out the communications around that a little bit like
typescripted ICO right like the second you guys announce it I'm like I want to sponsor this so badly but then it's like do we sponsor it at data bricks or is neon right so like there's always like we still have to figure it out a little bit because it is confusing if data bricks shows up at a typescript conference right So I guess it's neon
then but yeah uh other than that I think it has been super smooth. That's awesome. Yeah I imagine over time neon starts to feel like a product offering of data bricks right but in the beginning it it's strange because people are used to neon as as its own thing and data bricks as its own thing and it takes it takes
time for people to kind of like mold the two together. Totally. Yeah. I think the cool thing about Neon
was and much like Superbase I would say too like when a Gentic when the Gentic like coding moment happened both databases were like the easiest to like integrate um and or it was almost like because it was just ready for it was just ready like for whatever reason like I know Neon had already had platform as a service built in so you could easily provision databases for people and stuff
and then Superbase like probably jumped in because of lovable like Like what do you think about that? Were you like you were pretty much part of that too like in in terms of dev rolling it? Yeah, I think most dev tools that focus a lot on like CLI experience and just like the developer experience in general where like where you want to be able to
automate anything, you provide an API for everything that makes it very easy to wrap that in a tool or an MCP server or right like anytime you have a great developer experience, especially with an API, I think you're in a good position now to like take advantage of that and like uh wrap rapid in tool calls or um
yeah provide a good agent experience as well and then for neon specifically I think what where we're also just in a good spot is that by focusing on the serverless approach uh so for for those who don't know neon is a serless uh postcress offering where uh we split uh storage and compute so we allow them both to scale separately including scale
to zero uh which means um it's very resource uh like it's it's good like utilizing your resources uh efficiently, right? And that's especially important if a lot of agents doing a lot of stuff and maybe something is created and then never used again, right? Like you want to make sure it's not like running a Postgress instance 24/7 in the back because that's very expensive and that's where serless
is uh I think very powerful or just like very um useful and valuable. Um so I think the architecture and like just the way it was designed just fits very well for the agents use case. Um we can talk a little bit more about that in a second. like kind of like what different databases we think of right like agent
manage versus like I don't know platform databases but yeah I think it was just like right time right spot yeah for sure is most of your traffic or usage coming from agent like agentic use cases or is it still like platform DBs like you're mentioning I think it's slowly creeping towards the 90% uh agent based oh wow there was 80 a while ago now it's 90%
and that's that includes both uh people using Neon through the MCP server. So like an actual like dev writing prompts in an MCP uh server call is invoked, right? So like that we also count as agentic, but the the vast majority of that is like replet agents managing neon databases um and other like tools like replet. Yeah, I mean that's interesting because I used neon you know before the agent
yeah all the agentic stuff really became popularized. So it it's amazing that you know I thought you you were all killing it then just was really so easy to spin up a database especially for you know oftentimes I was spinning up lots of different projects to you know prototype things and doing it all by hand you know
back in the old days right as we did uh but yeah so that's awesome that 90% that that's a huge number that's that's a funny comment because I also like damn I always thought Neon was by Versel that's it's pretty funny Um yeah, I kind of had that same in uh uh assumption back then because you guys replaced uh PG from vers like the versel Postgress like did when the marketplace
came out and neon was the database. Yeah, they're so much better. It was so much better. Oh my god. Think
like there's some collusion here definitely or you know is that Versel's thing? So that's that's pretty funny. Yeah. No, I think from the very beginning we try to be very good at that like white white labeling experience.
There's actually a couple of cloud providers that resell neon under the hood and like it's always up to them if they want to say it's neon or not and some are like very open some like don't really care. Um but yeah and versel did that in the beginning right was Versel postcress and then they open up to the marketplace. So totally understand the confusion there
but yeah uh I have some cool stuff to show you guys if you're interested. I have a little demo, please. Yes, we we love when guests bring demos. I think the the audience likes to see
likes to see real things that people are working with and building. I have a real thing here. Uh tell me if you can see the screen. Yeah, looks good. Okay. Um Abby knows
this because uh we were hanging out at a meetup, but I was building a codegen tool. Um it's open source. I called it Eileen. It starts with AI. I thought that's a cool name. uh and it's using obviously MRA because
how else would you run your agent uh loop and then it's deployed on an XJS app uh on Verscell and it's using assistant UI for the chat and then a really cool tool that I want to plug today as well called freestyle freestyle.sh SH uh where you can provision uh git repositories and dev servers which works like very much optim optimized for codegen. So a really cool tool as well. Um
so yeah we can talk a lot about a lot of things about uh this project and you guys tell me where you want to dive into but the first thing I want to show you guys is just how it looks. So you have projects right like exactly how you would imagine it. This is running locally but I also have it uh already deployed and then once this is
initialized um it looks exactly like all the other codegen tools. Um but you already kind of like hinted that neon has a lot of AI or agent use cases and uh usage. So we have actually started building features out now um that were requested by codegen platforms and and agents right like specifically targeting thatuh that use case and one of them is uh versioning your database um and we
call that feature snapshots and I think it's pretty cool so that's the one thing I want really want to showcase really quick yeah let's do it uh so you guys are probably familiar with this feature where you have your prior versions of your codegen history right Mhm. Um the problem with the database there is that the database is stateful, right? And then only because you revert back
the code doesn't mean it then that it also reverts your database, right? So uh if you let's say you have uh a commit hash of your prior version, right? With Git, it's very easy to revert back to a prior version of your code. Um but then
your database may be in conflict with the code, right? So like if you made any weird migrations, maybe the agent even nuked the database, then reverting back the code alone won't help you. Yeah. Um so that's where snapshots come in.
So I can jump here to my where is it? Yeah, I have an organization here called Eileen Free tier projects. So this is where the agent has access to and the agent can create new projects here and those are neon database projects. And you can see
here's the master is awesome backup uh project. I don't know why I called it backup. And when we go to the tables, so this database is fully managed by the agent. Uh so the agent decided to create a topics table because I told it we want
to have topics and then the agent created some some data here uh already in the database. So if we go back to the to the uh local host version running here, what I can do is I can pick any of the prior versions. So let's say here we add topics section to landing page displaying topics from database create topics table and seed with the initial topics right so this is where we created the database so if I
jump back to the version prior to that then obviously it resets the dev server back to that commit um you can see now it's it's reset but behind the scenes what's also happening is if I refresh here um you can see we're now on a different branch of your database so one of the Cool things about Neon is that we have branching. So you can instantly
duplicate your database. So we we created like a backup branch based on your prior version. But if you go back to the main branch which is used by the agent, you can see that the uh tables are gone. So it reset to the version
before the topics table was created. Um and you can imagine how useful this is. You can like jump between different versions back and forth and always restore your database um with point in time recovery quickly. So that's called snapshots. Um, and a bunch of code gen
platforms use it for versioning behind the scenes, which is awesome. Dude, that's legit. Yeah, I I I think a lot of people, you know, probably see that that feature, right, and maybe don't understand how that's working under the hood. So, it's it's pretty cool to actually see it. Yeah. It reminds me of WordPress.
Did WordPress has had database resets or like restores? They had revisions. Oh, revisions. Yeah. Yeah. Okay, that
makes sense. Well, you're like, you know, like in this in that case, right, it's all like this content data that has different versions and in SQL and But now there's like just two there's like your own application which could be any schema, anything that you want and then like this like layer like the content management system on top of it
which with your MCP, right? Yeah. It's it's like a CMS for databases almost in some ways. It is uh if you
check out the like this is my personal personal Neon account and there we have the actual Eileen project. So like this is the meta database right this is the meta this would be like the MRA cloud database or the replet cloud database where they store all the users every project the billing um and as you said
right like that's where we manage the different versions. Mhm. So every project here has a project versions table where we store the git commit hash associated with that version but also these neon snapshot ID and then it's just one API call to the neon API to say reset the datab uh database back to that snapshot.
That's legit. That is legit. We might need to use some of this for our uh cloud databases. Yeah, I mean I imagine you have some people
that have end up with, you know, dozens or hundreds of different versions, right? How does that I'm assuming that must scale pretty well, but I imagine some databases are very, you know, pretty data intensive. It could be very huge and have ton of versions. Are you doing things behind the scenes to, you know,
try to, you know, make that more efficient or is it as simple as you just have a bunch of copies of these things and you just, you know, storage is cheap, so you just do it, you know, you just spin it up when you need it. Yeah, it's a it's a bit of both. Uh, so in general also with all branching feature, we do a lot with copy on write
and and postcris write a headlog where uh we just save the div between different branches. So we don't actually have to duplicate your data. Um but with snapshots in partic and specifically also with like just these agent database. Let's say you don't use your database for a month. What we do is we store all of that in S3. So the source
of truth of the neon storage layer is always going to be S3 and then if your database is running obviously it gets uh exported out uh into the page servers and in our storage system. Um but yeah S3 is very cheap but then also we do a lot of cool stuff as Friday headlocks. That's it.
Dang. So like so so the vibe are we are we considering these like coding agent products um are the main ones that I mean I think there are other use cases here but like are they using it super heavily that gives like them hella usage as as well I'm sure. Yeah. And and then the crazy thing is uh first of all they uh we have a lot of codegen platforms for building on neon
but also agents use the platform way more aggressively than humans would right like where human would maybe make a create one branch in the beginning of the day and that's the dev branch where they then develop a feature on agents are way more aggressive in the way they they do stuff right like you you probably know that as well it's like thousands of tool calls in minutes like
uh let's investigate this table one more time then let's create one more test branch here Oh, it's failing. Let me try get another time. Right. Uh, so we just see a lot more usage in general just because agents are also with the
resources they have available way more aggressive. Yeah. I imagine one agent is like a dozen or maybe a hundred users, right? Exactly. Obviously much more efficient and can
make simultaneous requests and and all that where it's, you know, a single user can't necessarily do that. Totally. Do you all have any horror stories from running database? Like you know how everyone used to have horror stories for
running databases in production like do you have any horror stories for that in agents? Uh yeah I mean if you Google host more neon and look for for a blog post in June or July I think we launched it in July we had we had some incidents because of just the the new usage patterns and how they affected the platform. Um so I think that's counts as
a horror story. Uh yeah, it's just it's very interesting because it just uh um affects our platform very differently just the way uh agents work with with databases. Yeah, I I imagine it's the techniques haven't changed but the usage patterns and you know maybe the scale of individual agents compared to users changes. So I I imagine there's there's some learning curves to to get it to
where it needs to be. Totally. carbon carbon magnets his horror story is drop table users. Yeah, that's why snapshots are so powerful, right? Like if your agents
screws up and you have a platform that supports versioning and has and also versions your database, right? Then you can just easily revert back to a prior version. Also, the agent can create a could it create a snapshot from main, do its change there, validate it and then promote that to main. Yeah. So that would be like uh so snapshots is just a point in time little
like snapshot of your database and we store that for you and you can always revert back to that version but then we also have the branching feature where you can duplicate your database and every branch is is something you can connect to and has an isolated database environment. Um so our neon MCP server actually I can show that real quick um as well.
I want to see this always better if it's visualized. So if I go on my master studio here. Uh so this is the uh the the master playground for the codegen project I just showcased. And as mentioned the tool I use for dev servers and stuff is called freestyle. Um but I'm also connected the
agent to the neon mcp server. Right. So what I could have done if I'm not if I weren't as lazy I would have created some custom handcrafted tools that provided exactly the access to the database that it needed. But I was lazy so I just gave it access to the whole Neon MCP server with some tools that
probably shouldn't use or doesn't need to use. Um but the good thing is that I can show you all the tools now. Um and one of them is uh prepare migration. Let me just see. Yeah, so we have these uh
prepare migration and complete migration tools. Um so if if you just doing local dev and you have the neon mcp7 enabled, you can also use take advantage of that. Um, and what the prepare migration tool does, it explains to the agent that a database migration is scary and it has to be careful and it should always first
run the prepare migration tool. um where we will create a branch and then the agent can execute the migration SQL queries against that branch and then we encourage the agent to inspect that branch like check out the tables, check if the data still looks solid and only if the agent is happy with the results
should it actually do the migration against main the main branch. So it's it's I guess to your point right like where you can always use some cool branch strategies as well. Yeah. Yeah. I think users or people who's are
watching should look at this tool description because that's what you should be doing actually having good tool descriptions. Yeah, the the downside of the Neon MCP server for code chain agents is that we have way too many tools. Uh I'm not sure so far it hasn't been a problem so I'm not changing it. But one thing I would probably do differently soon is like
handwrite some tools that I think the agent should have access to in this case for this specific use case. Right. for local dev. This is perfect.
Cool. It was awesome, dude. Yeah, thanks for sharing. Nice use of MRA.
Yeah, that was was awesome working with the playground. Like it's so easy for uh for debugging, right? like you can just I had some issues with uh one of the MCP servers I was using and then I had to debug it and the the fact that you can just like select one of the tools in the playground and fire it up from there to
confirm that it's with it's it's a problem with the tool not with the agent or with some other stuff just was super helpful. Yeah, we thought so too. Yeah. Yeah. Uh we are uh some even more
improvements and changes coming to the the playground experience soon. So, we we we've heard that from a lot of people and that is uh you know, one of the things we we figured we should double down and make it even better because we keep hearing people really love it. So, let's uh let's find ways to continue to improve it. Awesome. Well, Andre, it was great having you on
the show. Uh we'll definitely have you come back on again. If anything new comes out, you launch anything, you know where to find us. We're here every Monday. Yeah, every Monday. Yeah. Thanks so much for having me, guys.
Yeah, of course. See you at the conference. Yeah, we'll see you at the conference. Thanks for sponsoring.
All right, dude. One guest down. All right, many more to go. Many more to go. Should we roll right
into it? Let's roll. All right, next we're bringing on Eric from Code Rabbit. So, Eric,
welcome to the show. What's up, dude? Oh no. Oh no. It's not it's not a a live stream then
we that's how you know we do this thing live everybody. Technical difficulties. Well while Eric figures out his audio. Uh that was pretty pretty awesome uh
seeing a demo from Andre. Yeah Neon is a really cool product. So it was that. Yeah. I just liked I used to use it quite a bit just how quickly you could
spin up a database, right? It just almost instant and then you you just had it and a lot of times they were they were sometimes they were real things and sometimes they were just throwaway and so I think it was awesome how seamless you could use either. And so I imagine that's why it makes sense for agents, right? Yeah. because it's you can use it for real things. You can also use it for
if you need a database for a while for, you know, and you can throw it away at the end. And that's why having it scaled down to zero, I think, is really smart. Back, I guess.
So, we're going to lose Eric for a minute while he rejoins and tries to uh get audio. And you know, you might think this is Eric's fault, but it's probably because our lack of preparation and lack of general uh you know, general prep to the to the show. And Eric is back. Do we have some sound? I think we have sound. We do. You're here. All right.
Now, now my video is all messed up. It's, you know, sometimes it's it's one step forward, two steps back, you know. That's that's how we do it around here.
Cool. Yes, this is one step forward, two steps back. Yep, that that checks out.
That checks out. Well, Eric, we're thankful for you joining us today and, you know, for obviously being involved with uh with the conference as well, TSA ICOM, but maybe excited about that. Maybe you can fill people in uh tell us a little bit about yourself, a little bit about Code Rabbit, and then I'm sure
Abby and I have tons of questions we can throw at you. Cool. Awesome. Um, I'll try to not yap
too much, but feel free to stop me if I am. Um, yeah, I'm Eric. Um, I work at Code Rabbit. Um, before this I was a engineering leader at uh, Moro at Cisco
Moro. Um, and last year yearish year and a half there was um, working on like working on agents. Um, so it was it was a lot of fun and um yeah, I've just been hacking on um agent stuff for the last I don't know while or LM stuff uh for a while and um I think I saw MRA maybe what like half a year ago or something like that. Um came across it and was like whoa sweet. Um cuz I think
before you guys it was the only way to do anything in like with JavaScript was like lang chain um or maybe like tie yourself to an open like open AIS SDK or something like that. And uh yeah really love like the high level of taste and quality that that um that you guys put out in the the Monster framework.
Well appreciate that. Um, so, so we first I first met you, Eric, at a of all things JavaScript trivia night where How'd you guys do? Well, we got, you know, you would think that the guys behind writing a JavaScript framework would have got first. We came in second, uh,
unfortunately. You know, I want to recount. No, I'm kidding. Um, what but
the questions were really hard. you you know you really uh because you uh Lewis from uh Y Cominator was the one behind the questions and his knowledge is uh pretty intricate of the JavaScript language and so that was a pretty fun night. Yeah, that yeah you should definitely have Lewis on at some point that that guy knows JavaScript uh inside and out.
Um yeah, we should do it we should we should have him come on and do like a JavaScript trivia on AI agents hour. Just you got to answer in the comments and answer the question. Yeah. Yeah. But that was a brutal that was brutal. I was
like laughing because that was I I got to have the privilege of just MCing and just looking at some of the questions, man, as they come up. Like I literally was laughing at at a bunch of them. So hard.
There were a lot of good There were a lot of funny prizes that night, too. Like free code review from a person, one free voucher. Yeah. One free code review. Free tickets to the conference. That was cool.
Yes. Yeah, thank you guys. Um, yeah, I I don't know if I if I told you guys this, but uh we also ran like uh like 10 different agents and we had like uh one that had access to an embedding model that of like all the ECMAScript uh specs. Um, and I think I only showed a
couple because I felt bad because like no one no one got more than like 50% right. and uh the models like crushed it. Uh so it's like oh it's not we are we're getting replaced slowly. Yeah. Yeah. I mean you know but but then again that
that'd be like saying that you know you got to do math without a calculator. You know it's like some of that stuff was basically you had to run the JavaScript code in your head which I imagine a machine should be able to do that better than than a human. So I I don't feel too bad that they crushed it. Um because that you know ideally that those are the
tools we should be using to answer those kind of questions 100%. Dude, like a couple of weeks ago I started seeing you everywhere. Um and I found out that you you know you're now at Code Rabbit. What is Code Rabbit? And
I like I think everyone's seeing you guys everywhere. So like talk about that kind of motion too. Yeah. Um well uh I wasn't here for the founding which is a little over two years ago. So I can't speak uh totally
to the the thinking of it, but like what I could see is that um like Code Rabbit is really established as a product. It's it's code review. Um tries to be like a basically like having a senior engineer review your code doing the first pass.
Um kind of like that that tedious part. Um and it's like having a senior engineer that does all the things that a senior engineer knows they should do. uh when they're doing cover but like never really has like all that time and um you know computers are pretty fast uh so they can do a lot of tasks um and uh
yeah and then like basically having um having an LLM there uh to give a little bit of agency um to make like judgment calls on things um turns out to be like a really really useful thing. Um, so I know the team was like just heads down focused on product and they they still are very just focused on product. I've never been in a place that's so
intensely focused on product. Um, and you know metrics and and trying to understand everything that goes out, what exactly is the the effect of it? Uh, so I'm learning a ton. Um, which is
a privilege. Uh and uh yeah, I think that it's basically like well there's there's so many people that are out there like on you know on X or other places that like hype up stuff a ton and that's definitely not the culture at Code Rabbit. Um it's like a very professional kind of it's a dev tool yet it happens to be AI um that that helps us achieve a little bit more than we than you could just with the static
tools. Um, but it's a dev tool, so it should just do the thing it's supposed to do. Um, but in the in the in the world right now, like you do have to make noise. You do have to be seen. Um, and you know, we
don't want to go out there um as like hype beasts. Um, but like we if you don't know about us, you you should definitely check it out and try it. Um, and if you don't like it, let us know.
And kind of like the way to the way to do that right now is to uh to be a little bit louder being you know we've done like a a BART station takeover a MUN takeover and SF a billboard on the 101 um you know uh different things across like YouTube and and X and things like that. So yeah, I'm trying to trying to raise more and more awareness about Code Rabbit, but
uh but yeah, I don't know. Uh you think like it's cuz there are I mean like you guys have a couple competitors. Are they doing the same Like are you trying to just kind of oneup them on at least whatever tactics they have? Um
that's a good loaded question. Um I think I think there's there's the way I think about it is like um there are like so rabbit basically defined this like product vertical of like AI code review um and having like um this agentic like set of of of workflows and that that's been out there for a little over two years um and there are
companies that um you know like startups for instance that have have pivoted to that that um that product category um or uh even like new startups that just focus on that. Um they're also like uh other dev tool companies that are kind of like expanding horizontally across like dev tools and um and so they're you know having some kind of uh new product or something like this in this category which is uh you know
you could get like worried about it. Um, but like it's also just massive validation. Like we have we're on already over 2 million uh repos. Um, I think last I saw was like 13 million uh
reviews done. Um, but it's it's probably more by now. Um, and we get a lot of like good good feedback from folks. So there's a lot of validation um for like
AI code review and it's it's pretty pretty darn useful. like I just have a default on on my all my personal projects um nowadays. And uh yeah, and then we also see like some really really big players um like massive tech companies that are also kind of kind of getting into it as well as like the frontier the like frontier labs and and
and other things too. So there's definitely um uh it's getting more crowded but at the same time like the the total space is becoming bigger and bigger and bigger. So yeah, um I think we try not to think about it too much. Yeah, I mean I think that's that I think that's the case with a lot of different,
you know, types of products in in the AI world, right? I do think that it's not as necessarily a zero sum game either. You know, there'll be, you know, maybe in the future there'll be some, you know, you can't have a hundred, you know, code code review, you know, tools, but I think the amount of people using
it today is minuscule in compared to what it's going to be in a year. 100%. Uh, and if you're not using some kind of code re code code review tool, even if it's on a personal project or especially with a company project, why I would I would actually be curious because if you're if you're using AI tools to write
the code, it's probably useful and at least in my experience, it is useful to have kind of a almost like an unbiased third-party AI tool that can provide at least the first pass. It's not going to catch everything. It's not going to be perfect, but usually it's pretty like it's pretty good. And so in my
experience, I would, you know, I would highly recommend people try it out if they haven't because I do think you'll you'll likely catch things because one of the it's almost like a lot of the the small things that you know like I I used to be the guy so I I've been put out of a job I will say like I I I was always the guy that would be like oh you have
this missing comment here or this bad formatted comment and you know like that not that helpful of stuff like nitpicky stuff like I would do that just to be like hey let's let's keep the codebase that stuff goes away now. Like I don't have to do that because the agent can usually pick up on those like just little mistakes, right? Little things
that maybe they just missed. And then obviously like some bigger things too that you know even an engineer that has 20 code reviews to review and they're writing their own code. Maybe they, you know, they went a little quick through that when they missed something. So, it's nice to have kind of the the little like angel on your shoulder that can at
least give you the first pass before you send it to a team member who points points at something and says, you know, why'd you do this, dumbass? you know, not not in that way, but the code review just like models, like I think everyone has different experiences with different coding review uh products
cuz like even on our team, we're like super split between what people like and like what we're using and then like some people are like they swear by it. Some people is like Claude Code is way better at this and it's just like all this like and it's it's because every product is so different and maybe people's like
like human experience with it is different as well. Um it's crazy. Yeah. I mean like we're we're all engineers. You pick the tool that like works best. Um I'd say like you know
kind of with with any tool in uh that that's for a specific thing uh if you're working on a team like you should probably like just just pick one um and you know that this actually reminds me um of a tweet from uh Simon Fasheed from assistant UI um like I think a few weeks ago um uh which was something like you know one one of the most like under
underappreciated uh features of code rabbit is that it detects other code review bots in the PR. Uh and so it'll just be like hey even like even if you have it set to auto review on your repo it'll be like hey like I've detected you know another code review bot so like I'm not going to do um you know spam you
with the same things that that the other PR uh reviewer reviewer bot's going to have there. So um Simon runs like four on his uh repo, right? Yeah, I think yeah, I PR into assistant UI uh like I think last winter or something like that and was like, whoa, this is a lot.
Collecting them like dandy. Yeah. Yeah. Um but I mean like at the same time like I don't know you I'm sure
you guys have the same experience like over the last like couple years like maybe you were kind of like set with like your dev setup, you know? Uh and then in the last two years you're just like it's it's a little chaotic. it just like how much things are changing and how many new things I'm trying and so yeah I I I am curious just in in general
this is a good uh audience participation if you're listening in how many times have you changed something significant about your development setup in the last 12 months because I can name you know I've been a you know diehard cursor then it was windser for a while then back to cursor now claude code and you know I've
heard a lot of people saying you know codeex you know is now is now the new cursor or now the new Claude, but a lot of people are, you know, disagreeing, right? But it's just it seems like there's so many really good tools and it's hard to keep up with them all. And I'm just curious. So, if you are listening, how many have you gone
through? I can name four. I' I've changed my setup four times for sure in the last, you know, nine months.
I would say the same. Yeah. Yeah. Every week.
Yeah. Reuben every week. That's what it feels like, dude. Yeah. I don't know. I'm I'm in the mode
of like anytime something new comes out too, like I'm just going to try it like right away. Like maybe it's only five minutes or something like that, but yeah, so much new new in the space. At the same time, like my kind of nonAI uh tooling has changed too, where I've just I don't know if it's like the fatigue from all of the new tools or or
just like it's easier mentally to compartmentalize things, but I've like really gravitated towards single like high performant uh simple tools that do a thing and do it well. Like um Ghosty um just like it's so fast and it's not good. they're just going to be, you know, a really really fast um terminal shell. Um yeah, I don't know. Um Zed as well. So
like I switched I saw like the memory usage and energy usage of like um VS Code and VS Code forks and there just like so many extensions and everything now um that like and I think Zed especially in like the last few weeks um has become like really nice like they're doing like kind of this like performance focused
developer experience focused uh you know a lot of constraints around like uh extensions and and all that, but for the purpose of like making it a really really nice developer experience um to to be in. So, I'm really really stoked. I've been meaning to change my terminal.
I mean, I don't want like no hate to Warp or anything. Like I think that was like a change I made during the last 12 months to use Warp, but I want to like I don't need AI in my terminal personally like uh because I already have other tools for that much like anyone else. Um, so I do I do want something a little
bit more raw, I guess. I don't know. Maybe we should become a Neo Vim guy again.
It's never too late. We Yeah, we we we support Neoim. We also have a open source like fork with some like custom NeoVM stuff um from code rabbit. Um, oh, if you want more more
CLI tools, the we also have a I think it's still the only agentic code review CLI. So, you can spin it up with like with cloud code or with codeex um or really anything you want. Just say, hey, when you think you're done, call the the code CLI and do a review. That's it. Let me pop that for you right here.
guess. Um, but yeah, it's uh it's it's fun though. Like it is kind of fatiguing with like every new thing that's dropping like every new model, every new tool. Um, but
like it is also super exciting um super exciting time to be an engineer. Yeah, dude. Yeah, indeed. Also, we started using Code Rabbit, too.
Really? Yeah. I didn't even know as as of uh before this meeting.
So, I was I was actually trying to figure out because I besides the microphone issues. Um uh I didn't see the like uh memo you guys left in the in the calendar invite like bring something bring some topics or whatever. I'm like oh no, what do I do? So,
we like to keep this pretty casual. So, you know, I I we basically tell people, you know, for those of you who might be a guest on in the future, it's just like if you want to bring something to talk about, that's cool. If you want to hang out and just talk AI, that's also cool.
We're going to ask you some questions. We're going to keep it pretty chill. We mostly want to learn what people are building and, you know, about the cool products in the space. Uh yeah,
got a fan in the chat. Hey, and I think you're you're doing a bunch of stuff with open source too, right? Over at Rabbit. Yes. Huge. Um it's like I know it it
feels really really good. Um just to be able to to to well contribute to open source. So like a lot of the tools that we use um are open source tools and we we contribute to them when we find issues with them or when we want to um propose like new features. But uh also earlier this year, I think February, we
announced um $200,000 uh cash uh to give to sponsors. So like Code Rabbit itself is free for open source like by default. No like form or application or anything like that to fill out. Like if the repo is set to public, uh it's it's we
consider it our open- source plan and it's free. Um which is really really cool. Um, so it's it's on over a 100,000 uh, it's probably more than that at this point, but um, over 100,000 open source projects. Um, which is really cool. Um and then last month uh when we announced
the series B, we at the same time upped the uh open source pledge to being uh a million dollars in cash uh because like credits are good and stuff and we probably spend more than that in in inference costs uh for open source uh but um the at the end of the day like open source maintainers need resources and they know what they need best. So,
um, yeah, I remember even talking to you guys about this the couple it was at the after the JS trivia. Yeah. Yeah, you guys you guys did the the OG uh or like full full G um like open source ethos move, which was like I was like, how could we support you guys? And you're like, don't support us.
Support Hono. Yeah, those guys need it. So, that was really awesome. There are so many OSS projects
that are not venturebacked and are so used by everybody that rely on donations and if they have like high usage, dude, that you guys are like it's very gamechanging thing having a a fund like that. So, dude, that's tight. So, congrats to you guys. It's super
h that's how you get some goodwill, too. Yes. Um Yeah. also just it man dude it
feels so good like I'll like I'll like go on uh and be like the way I set up like my new laptop um I was like okay oh this is like an open source repo like do we support them already oh no we don't okay let me reach out how can we support you um like that's such a cool feeling dude and like I could just imagine being on the other side of being like just
getting a message that's like hey I love your saying, "Can we just give you some money um every month?" Um it's it's super cool. Like I don't I have a uh repo. You shouldn't use it, but uh but
it's like 25 or 30,000 npm downloads a month. Um and it's just like it's like a feature complete thing for like years. Um and people still use it. Um and it's
kind of scary. And I've had I think the repo is like six years old or something. And I've had the sponsors the GitHub sponsors thing there the whole time. Um and I think the math works out to like it's probably over a million MVM
downloads or something like this at this point. Um and I have gotten exactly zero dollars in sponsorship over that time and like put in so much time. Um so it's it's it's very real thing that open source maintainers need support. Yeah.
Um they don't they don't need issues opened up that say like if you don't fix this thing like I'm not going to use your library or something, you know? Um so yeah, usually it's big companies that do that huh? Like they just like hog up all the sponsorship, you know? Then
you have to be like a popular person to get to get money. Yeah. Yeah. I think that's typically the
case. But yeah, I think that that is a one thing that is underappreciated in open source is you know people is as much as open source maintainers do probably like really well doumented re issues that are reproducible which most aren't right or they're not very well written. So you know one if you are opening an
issue please just you know take the time to spend spend that extra effort open you know being very uh yeah just use a lot of detail and um code can help with that by the way we also work in issues um I was just yeah I was just thinking like and then if it's almost like well then if you could just be really detailed in the issue there's probably open source repro should just have an agent that can
take the first pass at those issues. use and tell you if you need more detail and and ask for things because I I do think that would really help open source maintainers that would help us get their lives back a little bit where I mean we have people that you know especially at Gatsby like that was like someone's job was just triaging issues
that was literally we had a rotation it was a thankless task no one wanted to do it we because it was always people you know like pointing out all all the bad stuff and there always is bad stuff Right. It's like this doesn't work, this doesn't work. And so it felt like you were just spending the whole day looking at these issues, trying to figure out who who can work on it.
Yeah. How what's the priority? Like all these things that I I do think that AI can help open source maintainers at least get a little bit of peace of mind that someone something's taking a look at it, doing a good enough job to at least get it moved along to the right place or right person, and maybe even at
some point taking a first pass at what the solution might be. Yeah, dude. Maybe we should build something that creates reproductions, too. Or maybe Code Rabbit should do that, too. Yeah. I mean, like, so we we throw
things in in a secure container. Um, and you can just run whatever you want. Um, so there's like we're actually like one of the things I'm trying to like uh one of the challenges I'm trying to make simpler um is the Kadora config. So, it's wildly
configurable. Um, the default's like really good and that also presents a problem where like people don't even realize that they can configure it how they want to, but you can chat with it like on an issue or PR and that you can develop like learnings that way. So, it's like no no you know we don't care about um you know uh I don't know memory
leaks or something on this team so don't tell us about memory leaks anymore or something. Um, but uh you can also just like go all the way down to like uh premerge checks uh where you either do kind of like natural language if you want to or you can just like run scripts whatever you want to do. Um you don't
have to know domain specific language or anything like that like it's got a shell. Um you don't have to know that it has a web query uh tool like search. Um, but it it will search if you wanted to go to a website or something like that um during during its its runtime. So,
um, it's it's pretty cool. Um, I'd be happy to to sit down with you guys and and and jam on something like this is the same thing we hear from other open source projects. Um, I was talking to uh folks from Anaconda uh this big Python um package manager bundle thing or whatever. I don't know what to to really call it, but um yeah, like I was trying
to do some like kind of cool clever like what would be helpful or whatever and they're like, "Dude, like the thing we spend the most time on is triaging issues and like just like what you said." And that's totally an issue for open source, not an issue for like uh private businesses, right? Like that
employees all know each other and you know they've got some ticketing system or process or whatever or employee onboarding or whatever. Um, so it's it's totally an underserved um underserved area. Dude, I would love next time I see you, we should talk about that.
Also, I promised you this, so I got a hat for you, man. Well, I'll wear it next episode for sure. Yeah. Well, I will grab one, too. Uh, bring I don't wear them on Maybe I need
to start being a hat guy on the stream. I don't know. I feel I feel left out right now. I'm just starting something. I'm seeing if it goes anywhere.
Yeah. I mean, it's slowly used to be we used to, you know, talk about the energy drinks. Now Abby's just like slinging hats on the stream. It's a different one every time I see you, Obby. Yeah. It's a It's like Yeah. If you have
a hat, just send it to Abby. Elastic Search. I don't even know what it says. You
know, for chat. Yep. Can you can you just say your your home address um on the stream, obviously, so people can send your hats? Well, I you'll know where I I live. The
patch the patch. And if you find the saloon, you know, when I'm in town, you might find us. Just send it to the saloon, dude. That's
We're not quite there enough to have our own spot like like the one the one OG that's there, you know, quite a bit. But, you know, but we do we do uh you know, we do we do some coding, some live coding at at the saloon when we need to. Uh, awesome. Okay. Well, Eric, we do appreciate you
coming on, telling us a little bit about Code Rabbit. Appreciate, you know, your help with the conference. Excited to see you there next week. And yeah, excited to uh explore Code Rabbit more with
Monster too because you know we we use tools like this and we want to be you know we want to figure out how to make our all of our lives easier especially if you're using an open source product. So if you're using open source or even just on your personal projects check out Code Rabbit. Yep. Feel free to DM me too thor on on
Twitter. Um happy to to help. Cool, dude. All right. See you.
Take care. Nice, dude. Hats are in. Thanks, Ruben.
Yeah. And yeah, the unofficial This is funny. The unofficial PO box of MRA. The saloon. The saloon.
We're going to be moving soon. We don't want to dox ourselves, but Yeah. Yeah. Well, we'll be Yeah. moving.
We'll have to have a new We'll have to have a new location, you know. It's which is good because then you know people start trying to find us there and we'll have to find we'll have to have a new location. If only we were that cool. We're not. We're not.
But uh we can only we can dream, right? Yeah. So yeah. So Professor Andy might be a little delayed. Had something come up. We'll see if he's still able to make it.
But until then, until our next guest, talk about some news. We can chat more AI news. you know, we didn't get to it all for a reason because we wanted to uh save some time.
Yeah. So, let's talk about workflows, like I said. I guess Yeah. Well, Versel had their ship AI last week. They announced a bunch of
things. So, you know, we, you know, we knew about some of them that were coming. We learned about some of them with you all. And you know, MRO is, you know, built alongside AISDK, I
guess you could say, now. So, we work really well with with AISDK. And so, we're kind of excited about some of the new stuff and want to talk about it. Yeah. Um, let's see where to begin. So, I think we
should start just at like the more easier thing to digest, which is like we'll get into workflows next. Like the first easiest thing to digest is there will be a agent class in AISDK. Um and it looks pretty similar to a MRA agent and but there are some notable differences. Um if you know if users
want to use that with MRA they'll be able to. That's essentially our take on it. Like there are a lot of agent classes out there so you should be able to use any of them in our studio. So
that's the vibe there. Yeah. And I I think what we've learned over the last six months, you know, maybe in the last year, but definitely the last six months is everyone is kind of coalesing on the same structure for what is an agent, which is actually a good thing because it means if you're actually out there building an agent, it's becoming more known what
that actually means. Okay, you have some kind of agent loop, you have tool calls, maybe you have some kind of memory, you maybe have some input output guard rails. Um, you maybe have some kind of handoffs between multiple agents. You So
there's a lot of these things that are pretty much the same. If you look at Verscell's implementation, you look at OpenAI's agents SDK, you look at our implementation, it's going to feel pretty similar. And I and I would argue that's a feature, right? It means that
the agent part is becoming less exciting. It's it's becoming more known what that means. What is what does an agent actually mean? Exactly. So, I do think that we'll see more of
that. And there's probably a lot of other like agent frameworks out there that are lesser known that are probably coming to the same conclusions. Yeah. Right. I think if you go down the path of building an agent framework, you
realize there's just a a handful of things you really need and needs to do really well. Solve that. Then you know you can start to think about some of the other problems which are you know memory and eval workflows which we'll talk about. Yeah. One thing I do love these new releases
from everybody around the place because you get inspired on stuff. So like Versel's AISDK v6 is exporting a tool loop agent which I guess is playing on the Simon Wilson like de facto definition tools run in a loop with the whatever the And that makes a lot of sense. And so what that really inspired me is like we could export the agent loop workflow, right? So you can
like instead of new agent, you could get the workflow which is what's running and then you can add steps to it and you could do whatever you want. So we'll do some cool like that too. It's not like we can't, you know.
Yeah. And that's maybe something that other people don't know is that our agents use our workflows under the hood, which is, you know, pretty cool. So it's just all built on this concept of workflows. So you could obviously build
your own on top of it. Maybe we should make it even easier. Yeah, that's a good segue to uh the Verscell workflows. Um,
so I yeah, I did a whole weekend uh I spent I had a good nice weekend with Versell workflows. Um, I'm going to reserve my judgment. I want to tell you what the community is saying and then we can talk about it ourselves um or support it. So let's talk about like okay first of all what are Verscell
workflows? They are a durable workflow library. It is a MPMI workflow which is crazy. That's the first thing you got to say. That's crazy to have that package name. So
I don't I don't know what they spent on it. We'll never know. Dad had the money for that. You know, they spent money on npmi AI MPMI
workflow. So, you know, they threw they threw some money at that for sure. Unless they had been like sitting on it for their whole career. Maybe. Yeah, maybe. I mean, that would be very uh you know, years ago they just
thought like maybe we'll do a workflow. We'll just like squat on that. But but I doubt it. I bet they bought that.
Uh let me share my screen here, too. So, um if you're looking at this, so my first take is they introduced a a decorator which uh which does some special magic behind the hood. um you know you can use them with this devkit that compiles this code into something executable based on the infrastructure and that's chill other
than that like it all makes sense you know you have workflows you have steps if you're not if you're trying to do parallelism you use promiseall very like JavaScript friendly so I mean definitely um definitely cool I did a compar compare and contrast with like how does this differ with MRA? Now MRA has this has two two concepts uh in workflows.
The first concept is you have the workflow graph. So when you're writing create workflow create step parallel for each branch etc. What's that's what what happens in Ma is you're creating a representation what we like to call an execution step graph and that doesn't mean that it doesn't have any opinion on how the graph gets exeu ex executed it's
just about what it looks like and then you have these things called execution engines so we have one that's a default execution engine which looks a lot like workflow just throw that out there but we also have injust as an execution engine and soon cloudflare temporal and you know maybe Versel workflows. So
that's kind of how our things differ. But if you look at the compiled code for a Versell workflow and the compiled, well, we don't have to compile any code, but the execution engine of the default, they're very similar in how they work. So um that's that. But let's talk about
some pros and like the pros area and then the negatives or like you know the detractors or whatever. So, this dude Caric super sick uh um deep dive which really helped a lot of people understand what's going on here. And I'll just do a little summary for y'all. When you have
used step, there is a compiler compile step, right? So, today you can use this library with like Nex.js, Nitro or anything that can compile or bundle So kind of one of the the things right now is you need to have like a bundler to to run this in a certain type of way. But what this Ustep does is it it uses like this SWC compiler and it
transforms use step into a bunch of stuff. Um but you essentially turn this into a function and then this becomes like a register step function. So already there there's a parallel like in mashra you you do create step and then this thing you know you just compile to create step essentially. So that's one
thing. There's a big comp compilation plugin with SWC. I said that. Then
there's a bunch of different modes this thing works in. Um it works as step mode, workflow mode and client mode. So this is pretty cool and like how to interact with this flow in each each mode. And then after that you kind of
like have these worlds. I don't like this name for for one a world interface. But the world is what like storage adapter the workflow is going to use. So like in MRA you can have storage
adapters for MRA which then workflows piggy back off of. In this case it's the same You have worlds which could be Postgress or whatever you want. And so that's like the storage part of it. And then there you go. You can like start playing with it, you know, sleep.
I don't know. A lot of this stuff is the same under the hood. Um, but that directive is what really is interesting.
So, let me talk about how the directive blew up on Twitter. So, on that side of it, okay, let's start with um uh let's start with this one. So Tanner Lindsay, he wrote a blog post about directives. So
called directives in the platform boundary, you know, so there's a quiet trend. This is actually a trend that people are saying like, oh, where did this trend come from? We used to have use strict in JavaScript. Cool. But then with later
React versions that were like proposed and then use in Nex.js, you had use client, use server, use cache. These are all new things that were technically in React but then supported in X.js. So it's kind of like that bundle or needs
to exist and they're now with use workflow. So um they're not really language features per se I guess you know not truly but they have like you know there there is a comp compilation step that has that. Um and he just makes a lot of really good points about if this is a good uh pattern or not. Um he's more on the type safe committee in terms of you should import things, you should use the
functions and you know and one thing he worries about is does this u really get you locked in? Like for example here instead of using use workflow, why don't you just have workflow and step aka what monster has? Just saying. um or you know things like that. So his worry is lock
in right and then all these frameworks now have to support a plugin to then bundle this and run it. So that's that and then the two funnier ones. One's more drama and like me Shane if there's questions or anything but uh not yet. Keep going. Cool. Do your thing.
Then our homie Dax, he comes in with the banger. So, because this directives thing really like is a polarizing thing for JavaScript. There are people who like directives and people who don't. I don't
give a personally, but I'm just gonna say Dax doesn't care either. But there's all these people now arguing over directives or not. And it's where all the either fanboys from each side start arguing on Twitter and then you see like, you know, you're not if you don't like directives, it's because you're not smart. and then he just
roasts them pretty much. Um, so you know, some people work on other types of software where their backend runs as a standalone service or on node or bun. There's a wide range of how this code gets compiled. Providing a library type feature requires a compiler that
requires a compiler means it has to support every single configuration. So it's harder to maintain, harder to it's just more complex. So it's a heavy choice for people who don't already compile it. And that's like the basic
thing. Um so that's that. And then the last point is from our homie David uh from Sentry. Decorators and function calls aren't
magic though. They're traceable. This is magic. No flexibility. No control. Full lock in. What do I gain from this?
Programmers who know even less who to program. um who break even more systems. Systems impossible to debug. I'd never use this. So coming out with the the hot takes there. Okay. So that's a lot of
different and different people. What do you think, Shane? Yeah, I think uh again, it is it is polarizing. I I don't
personally care that much. I'm not that strongly opinioned. I know some developers really are. they want things
their way and they have very set you know standards and how they want to do things. I always thought that you know even with Nex.js I when you had the use server use client I kind of thought it was just dumb like just to have a string but you know we've had use strict in the past like I get it.
Yeah. This seems now it's like decorators inside functions. This seems like another level. It's it's one thing if
it's at the top of the file. It seems a little messy to have to like have these just random strings that do magic, right? It's just so I'm not a the biggest fan, but at the end of the day, I don't care that much if it works. Like
I'm mostly like use the tool to get the job done. So I I think it is kind of cool some of the stuff they're doing under the hood, but again, it's you're kind of subscribing to the magic and the more magic, the more likely that something doesn't work and you feel like you either getting locked in or you feel
like um yeah, maybe you just don't know how to debug it if something doesn't work the way you expect. So that's the only that's the only thing. Um yeah. Yeah. We've been we've been ripping magic out of our bundler for the last
couple months because of this problem of you know it's hard to debug and when you're not the person who wrote the library aka users you know but after looking at the compiler maybe it's cuz we're bundling nerds though at Mostra none of us said it was like I mean like it's kind of whack for the user is our sentiment. or maybe some of our
developers it's like whack for the user but as a bundle ruski type of uh person it's tight because I they were able to magically create a workflow engine through the bundle. I don't know how that will shape out long term but I definitely respect the work. It's tight. Like it is cool. Um, but it's not
something that I think will be good for our users or any like for the user unless you like marry into the whole ecosystem and which is probably the what the strategy they want you to do. Yeah, for sure. Last tweet got a point. Yep. Kramer def
definitely does that. Yeah, heard some cool Twitter chat about message passing between the agent and stuff. Yeah, we're gonna have pub sub. Well, that's for Verselo. We'll have pub sub uh engines in the new year.
All right. Well, I think we exhausted the the Verscell topic. Let's talk a little bit more. Uh this this one I found just kind of interesting. I
haven't read the whole thing. I've kind of read the the TLDDR, but wanted to share with people. There's this really cool paper on basically vibe coding with large language models. So, I'll share it
here. And if you if you like reading papers, this is maybe a good one to uh to browse. Now, that being said, maybe you just want your AI to summarize it because it's 92 pages, but I think a lot of it's just like the, you know, you have dozens of pages of just like the the ref. Yeah, just the reference. So it's only like I don't know only 40 pages or so.
No big deal. But um just I think it is interesting to just think about how you know using large language models to you know write code works and what some of the some of the findings of doing like a a longer survey and study on it actually is. So I'll drop that uh that link in the in the chat if you are all interested in some light I'll say very
light reading but obviously not you know not light reading. Uh another thing that you know has been released which is you know interesting if you're you know coming from a different world maybe uh lang chain released this open memory AI memory engine and I think this is just kind of something that's going to be
more to come. I think we've said it on this show that maybe 2026 is going to be the year of memory and this is just you know lang chain I guess is getting out early which is you know good good for them. You know they they compare like they actually have l memory built into lang chain but now they they have kind of a separate open memory package which
again is MIT kind of talks a little bit about what it can do. So you can check that out if you're looking at memory providers. There's a whole bunch obviously listed here. MRA has our own
memory and we'll be having a bunch of improvements in that uh in the next couple months, but definitely into 2026 because I do think memor is becoming more and more important. I'm ready to go to battle next year on some memory Yeah. Yeah. Ready to go to war on some
memory Well, so I think this actually ties uh really nicely into how we're talking about that the agent primitive kind of become we've all coalesed on the same patterns. Memor is going to end up being the same way. we're just a year, you know, it's going to take six months or a year and then what's going to happen is
everyone's memory is going to start looking close and then the differentiation, it's kind of like, you know, the differentiation in, you know, other types of products, right? Over time, if it's, you know, a database, all vector databases start to look the same. And then you have to differentiate on little sets of features, a little little nice things, but it becomes a little bit
more of what's your preference and less of actual technical how does how does it work? Now, of course, there's caveats to all this, but I think memory is going to be the next frontier for agents. So, I bet you this time next year when we're doing getting close to the end of the year, we're going to be saying, "Wow, everyone's memory is starting to kind of look and
act the same way." And I think that's kind of a natural trend that we're going to see. Yeah. Yeah, for sure. Memory is part
storage and part retrieval. Um, and there are just different strategies going on in the market right now. We're going to be doing some new too. So, that's where it gets
exciting, right? or you're trying to get mind share. So, we'll see. Yeah, we're cooking up some really cool
things over here. So, I'm I'm very excited. Memory is like one of the most interesting parts of the AI experience in my opinion. And this guy used memory or masher because of memory. Well, it's
about to get better, bro. Yeah. If you like it now, just wait. Yeah. Um, should we talk about this uh next
topic here? We you want to talk a little bit about AI music? So, we have Yeah. So, Professor Andy is coming on,
but he's going to come on after uh Gorob. So, we we have a little time. So, let we which gives us the opportunity to talk more AI and maybe show some show some uh AI music demos.
Let's do it. Does any like if you're in the audience like have you used yet? And if you haven't, why haven't you? Because it's pretty tight. So, how long have you used
how long have you known about Sunno? I I found out about it that one time we had group office. You weren't there, but I had group office hours at uh the uh the Subtrace, no sub image office and someone mentioned it there and I was like, "Oh, okay. I'll take a look." But I never did until you started playing with it. Yeah. Which was just like what, two
weeks ago, last week, whatever it is. I sent you a song. I was like, "Dude, check this out." Yeah. I wonder. Yeah. I we should pull that up
because I think I I made some I made a puno for the show and we have Yeah, let's play some of those and I haven't actually, you know, played it. I think literally I just sent it to you. I should pull it up. Yeah, I should find the link. Um, so yeah, Sununo is essentially AI music
generator, but it's actually, you know, it's Yeah, it's pretty cool. It's it's very good. It's obviously not perfect, but if you wanted to, you know, be a music producer, it kind of gives you, it makes you feel like you could do it. It
gives you the sense that you could do it. Just like that, just like the image generation models gave me that sense of, you know, like I could I could design great images or great graphics. This gives me the sense I could, you know, design like create music. Now, granted, I'm a musician. I do write songs, so I
can actually appreciate how hard it is to write a song. And now I can do it in seconds. So, you know, I guess I'm out of that job, too. Let's play this first banger that we
made. lady focus for the day to let me try to all I know focus to me. Yeah, I just want to be on I don't even know what he's saying.
Yeah, I don't know his name, but sounds good. It's pretty good. Let's see. There's another one you made.
It's got a good beat, you know. Got a It definitely hits. This is another one.
All right, let me share this one. And I think it is not that one here. All right, the next one.
Hey, I hate Dude, that should be our outro, dude. Wait, we should make that an outro now. Yeah. Yeah. I mean, we Yeah, we got to have an outro
next. Let's write the song right now just to Yeah, let's try to create our outro. Let's create a new outro for the show and let's show everyone what it comes up with. And uh
Ralph says, "No one really knows what they're saying, but it's provocative." I agree. Like, you don't know. They just
make like they just have a a sampling of tons of uh you know f filler words and they kind of just like you know creates creates some sounds but then they mix in some real words so you feel like you know it. Uh let's do male voice hip hop and then they say those are both actually fire. Yeah, we but these are the kind of tools that you
know on the one hand as a musician I'm like apprehensive because I feel bad for all the musicians that are trying to make a living and now people can just do it right but on the other hand if you're a content creator if you're a creator of any kind you have so much power of that you can do like we can create an outro
to the show and I don't have to plug in my guitar into an amp and open up you know Garage Band or something and try to like write something which I I could do. It would take me hours and it would probably suck. I was writing a bunch of R&B songs, dude. But this lyrics, but maybe it would. I I could probably
come up with something pretty good, but it would take me hours and I'd never do it because of the the trade-off. It just it wouldn't be wouldn't be worth the time. But this, you know, this takes us a couple minutes and we got a whole bunch of options and we can find some kind of outro song that we use going forward. So, that's pretty cool. Let's try Yeah.
Hey y agents. Oh yeah. Oh shame.
Now baby whack. It's cool but it's kind of whack. I think it's the funky part that we we'll tweak it. But let's try this.
They are aes yeah Master signing out, baby. I feel like we're in the 90s, dude. Yeah, dude. That one that one kind of hits. I'm I'm gonna like that one.
Yeah, keep keep that one around, you know? Maybe we can pull pull out some stuff from that. All right, so how should we change this? Should we do like one more round? Yeah, let's do one more round. Um,
I don't know. Let let's do like a, you know, I got I like can we change it from hip-hop? Let's do something more like rock, you know, like let's let's let's let's do something more in my genre of music, you know? We've been living in Abby's world. It's time to time to take a step into Shane's mind here.
Okay. What other like what other description words should we use? Uh, let's do uh you you can put like maybe like you could do like punk rock punk. That that's good. Punk rock. Heavy guitar. Heavy guitar.
Heavy guitar riffs. Heavy Yeah, put heavy guitar riffs. Yeah, that jams hard. All right, let's Nickelback.
So, there's always one There's always someone that says something something related to Nickelback. Come on, Nickelback. So, oh, I got a Nickelback story. Okay,
this reminds me. All right, so Nickelback, I'm not I wouldn't say I'm like a huge Nickelback fan, but they came to Sou Falls, South Dakota, and you know, they do have some good music. Like, I'll give them credit.
And so, I was like, I'm gonna go to the Nickelback show. And I guarantee you Nickelback remembers uh Sou Falls, South Dakota for not the good reasons. So you know, so we were in the the Premier Center, Abby, you've been there. We went to Theo Vaughn together and um so Nickelback has this thing which I
don't know why they do it, but they basically like throw little cups of beer out in the crowd like we're giving you beer. I don't know why people someone in the crowd didn't like that they were throwing beer at them. So they proceeded to throw their giant cup of beer at the lead singer of Nickelback and got
so and he had to go like change his shirt. It was a like and I was like that's what he's going to remember Sou Falls South Dakota is that there's one guy in in the audience that threw his beer at him. He so needless to say singer wasn't real happy you know kind of embarrassing moment for uh you know Sou Falls South Dakota but other than that it was a good show.
So that's my that's my one Nickelback story. Never made it as it was man. All right let's play this. Okay let's play this and then we'll bring out our next guest. Yeah.
agents. All right, that's that. And then let's get a little bit of this.
Which one do you like better? Um, I think the first one was a little bit better, but I'm surprised like, you know, the vocals were heavier than I was expecting, but I guess like punk and heavy guitar riffs like Yeah. Yeah. It's so crazy how it can just do that. Yeah. I mean, literally, we could just have a different outro every week.
Yeah. Just how we're feeling. Yeah. Like maybe that's the play. We're going to have like we're going to queue up like seven outros on the side and
then whoever just picks one just plays it. Um all right. Well, we had a fun diversion, you know, while we're talking about AI music, but I think it's uh time to bring on our next guest. So, we're going to bring on Gorab from So, welcome to the show.
Hey everyone. Hey Shane. ALB. Nice to meet you all. Yeah, nice to meet you, too. Nice to meet you. Yeah, we've we're uh
having a little fun. We've done some AI news. We've had Andre from Data Bricks, Eric from Code Rabbit, now we have Gorab from and then Professor Andy rescheduled. He's coming at the end. So, we have a jam-packed show, but Gorab,
can you tell us a little bit about, you know, what you do at Obviously, I think everyone knows I you don't need to introduce everyone knows but maybe you can tell a little bit about some of the stuff you're working on over there. Yeah, definitely. Um, so, so yeah, hi everyone. I'm Gab. I'm a product senior product manager at MongoDB. Um, and I
mostly focus on AI frameworks and ecosystem. So, um, this means uh it's it's a range of topic that's it's pretty broad. Um, so all the way from MCP to uh individual integrations with frameworks such as Mastra um uh falls kind of under me. Um and uh and yeah, these days we're focusing a lot on simplifying building
agents with partners such as yourself. Um and also thinking a lot about agent memory and how we can um uh get the accuracy of uh of these agents uh way more up. That's awesome. Are you seeing a big
trend now in in your users that are like maybe gravitating more towards like search APIs than sto like necessarily just storage? Yeah. Yeah, definitely. I think we uh
we've seen an uptick on in in both in terms of retrieval and rag context. Um but uh also primarily in terms of uh agent memory. Um and the way that we kind of fall into this picture is we see whenever uh people are uh customers are building agents we see kind of diverse range of um building blocks that they have to pick. So ranging from embedding
models to a vector store to an OOLTP store. Um and our vision is to really unify that and have kind of this one um agentic database of choice which is MongoDB where you can have everything kind of available to you um uh including things like re-rankers. Um so yeah, we we're definitely seeing that trend for sure.
That's awesome. I think right now the the market for an AI agent is very fragmented across a lot of data stores and a lot of different like provider options. So, you're saying like is going to have like all those tools built in? Yeah. Um, if you don't mind, I'm gonna um switch to screen sharing.
Yeah. We love We love to see we Seeing is believing. Let's see it. I'm just teeing it up for you, dude.
Awesome. Awesome. Yeah. I also I also love how this is focused, you know, on on memory. So, you know, for those of you that are just
tuning in, we don't do a ton of prep with our guests. We kind of let them lead, you know, talk about the things that are interesting to them. So, it was funny that we were talking quite a bit about agent memory earlier and then that's that's what we're talking about now. So, it all it all comes together almost like we planned it almost.
Absolutely. Absolutely. I I feel like it's a it's a very hot topic currently and uh and and yeah, I'm I'm happy to be a part of it. Uh so, what I was talking about earlier um on the left hand side
you can kind of see the pieces um whenever you're building an agent these pieces kind of have to come together. Um and uh for example like embedding model, your vector store, your reranker, these may be divergent across like different providers that you may have. Um and the vision that we really have is on the
right side. So um you have MongoDB and actually I wanted to mention Voyage AI as well. Um, VOJI is a uh company that we that MongoDB recently acquired um which provides uh cutting edge models uh reankers um for for search and retrieval. So um hopefully that provides an additional layer of context why MongoDB is effectively um a one-stop
shop for uh building an agent. So here um you have obviously uh your oral DB data that you store which is um unstructured document data that you can store in MongoDB. Alongside with Voyage AI we gain this um factor of embedding model. Um and then alongside your OLTB
data lie your vector search nodes that are independently scalable. Um and again with the voyage piece um to to really make sure that you have top tier retrieval um we have rerankers as well through there. Um and and that is kind of the vision if that makes sense. Yeah, that's that's awesome. And are you
gonna have more primitives um above this like storage layer or is it just going to be like hey use voyage's different components and build whatever you want. Um we'll have to see how the road map evolves of course. Um but currently uh we're we're really focused on the storage aspect of it and making sure that's nailed down. Um in addition to
with voyage um there comes a second picture which is we want to make sure that the retrieval that you're getting is of the highest quality whether it is through using rerankers um um or just uh just the embedding model which if you look at the benchmarks voyage u performs quite well in um so u so yeah
okay that was cool so another thing that we can maybe talk about I don't know if we can show it or not but we can definitely talk about we are working together on a kind of a combined template right of using Maestra plus you know and can you tell us a little bit about that and what we can expect I know we're not going to launch it yet but maybe we can tease that you
know not everyone not you know not everyone here you know keep this is cone of silence we're not going to share it out but just we like to give the audience a little preview on the live stream for those that tune in of of things that might be coming down the pipe Yeah, for sure. Um, let me jump back um to
screen sharing. So, y'all are seeing master playground, correct? Yeah. So, you are going to share. It's nice. Yeah, I I am going to share. Um, so
first of all, huge shout out to you guys for putting this together. It's been incredibly helpful to kind of um test my agent changes, whether it's prompt changes or even workflows. They've been super helpful in in terms of building an agent uh building this agent itself. So um so yeah, kudos to to your team. Um
what I wanted to talk about today in terms of a demo is to really show a very basic MongoDB agent. Um it's nothing like the the autogeneration of of music that we just had um where where I was jamming it backstage. But um but really to show um a very simple template for MongoDB to get started with the MongoDB agent. So um what we have put together here is an agent that is able to get
access to your MongoDB uh specifically your OOLTP data and then um respond to questions. So this could be things like generating um uh text to MongoDB query. Um or it could just be simply just browsing your your data and your collection in MongoDB. So maybe we'll
start off with a very small simple prompt something like show all collections. And what this is going to do um so we have um custom tools that we uh we build. Um so the four tools here are uh introspection so kind of getting your database schema um all the way to seeding your database to uh MongoDB query generation and execution. So here
you can see that um the agent called the introspection tool in order to get um the collections that are in my database and it just listed it out. Um, but you can do something a little bit more complex than this. So, for example, um, if you want to, let me just paste this prompt here because I don't want to type
live. Let's say you want to just list all products where the rating is greater than four. So, you may be doing these things, uh, while you're developing just to make sure that um, your your agent is giving out the right answer or maybe you're in analytics and you want to just u browse the data that you have in your database. So here when we send that
prompt out, it uses the generation tool, then the execution tool and then there it is. It prints out the um exact query that you would you'd run and then it actually additionally runs that and gives you the results uh of that query. Um so so so yeah that that is effectively the demo. The two things I wanted to touch upon um the integrations that we have in master today and u we
hope to hope to get much deeper than this over time um are uh store vector stores. So any rag use cases any enterprise data that you have um you can use our vector store to kind of retrieve um and add that context into your agent. Um and then the second one uh relates to memory um where currently the working
memory is enabled and um what you can see in the back end um so I'm switching over to actually inspecting uh my underlying data store itself in MongoDB um you can see for the uh for the database that we have it's an e-commerce database you can see things like collection of products orders um customers etc. Yeah, I think you're only right. Yeah, you're only sharing the
There you go. There it is. Is it good now? Yeah, we're good. Okay, awesome. So, I'm um just to set
the context, I'm on MongoDB Atlas and I'm just browsing inspecting my um Atlas cluster that I'm using to power that agent. Um so, as I mentioned, the OLTP data lives um lives right here. Uh so, all uh it's an e-commerce database. So all the products uh collection of products orders customers that um they
exist here. But the interesting thing um for me particularly while building this agent more specifically agent memory feature um are uh the the the collections for messages and threads. So um you can see how uh memory is configured using MongoDB store. Um and
here u are kind of your threads and then you can see the messages associated through that threads in this collection. That's awesome. So yeah, that was a little demo. Hopefully that gives a little taste of
what what's uh what's coming very soon as a template. Um and something else that we also want to um emphasize is the capability of integrating MCP. So um what you just saw here were custom tools that we made. Um uh but what we actually
want to do um o o o o o o o o o o o o o o o o o o o o o over over time is empower developers to just use our official MCP server integrated in and your MongoD DB tools are right there um without you having to whip them up in a custom tool if that makes sense. Yeah, I love the generation tool because in
our current vector implementations, we like for each store, we like baked in some like system prompt that would educate the agent to query properly. But it's so hard to pull off, right? But it's way easier for an agent to get context to generate the query and then you just generate it rather than like it
have to learn the query syntax and then like fill in the bubbles or whatever. I love that, dude. I think that'll be a pattern actually. You guys making moves there. That's great. Absolutely. Yeah. And that that's one thing coming
to MRA, you know, if for those of you watching, we have templates. Mostra.ai/templates.
We will be releasing a more official like integration section because we want to MRA does integrate with different services like and we want to make sure that if you are coming from the world where you're using for something and you want to pick up MRA you have this easy to install easy to use template that shows you all the patterns and then of course you're going to build upon it right it's not going to get you all the way there but it should
allow should educate you and also get you started faster and so we're excited that you know is going to be one of one of the first featured ones of of the list. So, we've been working closely and yeah, no no no exact date, but it's coming soon. Absolutely. I think what um something I wanted to mention um the reason why we
love Mastra uh particularly uh we think there's so much synergy with TypeScript, JavaScript and MongoDB and we've seen this historically um and Shane this is going to be a conversation we have in loop over and over again um but but historically we've seen um kind of like the mean stack trend where um where developers were able to spin up web
applications really quickly um using MongoDB node um etc. So uh I think we we really believe that you know with with frameworks such as Mastra and and Typescript frameworks in general there's so much synergy with MongoDB and in terms of especially with Mastra being able to spin up an agent really quickly
as we just saw. Um although I didn't show the code which uh we'll show through templates. um it it it becomes very easy and simple and I love the playground feature that you have here like it it helps us just quickly iterate on our agent and and our prompts and then test the results and run evals. So um so yeah, thanks a lot for all your
hard work. Thanks for being a fan. Yeah, and and thanks for, you know, helping us with the integration, too. Making sure that, you know, obviously there's more we still want to do, but it works pretty
well today. And that's, you know, thankful that you all helped us get there. Absolutely. Happy to.
Yeah. Is there anything else that you wanted to talk about? I I want to say thanks for helping with the TypeScript AI conference. You know, thanks for being a sponsor. We do appreciate, you know, the help of, you know, making sure
that it we have more it's it's great when you have great companies that that want to contribute and and really like help spread the word around Typescript AI. I think, as you just mentioned, there's there's a lot of synergy there and it's exciting that you're all a part of it. Yeah, definitely. And uh I'm looking forward to seeing youall in person
next week. Next week. Yeah. And for those of you watching, you know, you can come join us. There's
still a few tickets. There's actually not much. I think we're we're down to like maybe 20 tickets or something like that last time I checked. I don't know. There there's very few tickets left, but there are a few. And
you know, if you don't if it sells out, which it's probably going to, you know, DM me and maybe I can sneak you in. I don't know. We'll see. Um well, thanks for coming on the show.
you definitely as you get more especially as you launch more stuff around memory we'd love to have you come back on and showcase it you know later you know later this year early next year whenever whenever you have big launches come on and for sure we have we have some good stuff cooking up so um I I would love to come back and and kind of show those off and show some some more complicated demos
for sure complicated but useful and you know DZnost AI says this is exciting and also says where is it I'm assuming you mean the conference The conference is in San Francisco next week, November 6th, if I have my dates correctly. That's when I'm going to be there at least. So hopefully that's the date. All right, G. Well, it was great having you on and yeah, we'll have you on again in the future.
Thanks a lot for having me on. See you, dude. Our sponsors are tight, dude. Very cool. Yeah, I mean, honestly, you know, and
this is, you know, of course it's our conference. We're going to say this, right? We're we're throwing the conference, but I can honestly say I'm surprised at the quality of both sponsors and speakers. Not that I didn't think we would get some of them, right?
I knew that, you know, we we have friends. We're going to get some some big names. I'm actually just surprised at the sheer quantity and quality of people we have. Like, yeah, the the conference is going to be great. If you if you are not able to attend in
person, sign up at minimum you can maybe watch some of the keynotes that are interesting to you and you know if nothing else you can uh tune in and and get the recording. So please go sign up tsconf.ai.
I like that it's single track and we've been going to a lot of conferences lately so I don't want to like talk but I'm going to. I just feel like with all of the speakers we have and like what we're planning like you're gonna like be when you leave the conference, you're just going to be in thought because there's gonna be a lot of stuff
said that you want to think about, you know, and that's always a good feeling. Yeah. Yeah. I think it's it's just a
good collection of people from the JavaScript TypeScript ecosystem that are building really cool things in AI. It's it's all like, you know, people that are active in this space, you know, and and the sponsors, you know, that we have are amazing as well. So, all right. Well, we have one more segment for today's show. We have, you
know, someone who needs no introduction because he's been on the show many times and coming back on again for I think this is the maybe the third time, fourth time. I I lose track, but we have Andy from Osmosis. So, welcome Andy. Hey. Hey. Me again. Yeah, it's uh I think it's my
third time. Third time. All right. Well, you know who's counting? But third. That's awesome. You're a recurring character in the
show. Yeah. Yeah. You are you are a recurring character. You even have your own name. It's Professor Andy because that's how we refer to you because every time we
talk, we learn a lot. And so, we're excited to just chat with you and figure out what's new in your world. saw you had a big launch, you know, recently. So, maybe you can talk about that. Yeah. How are things going?
Yeah. So, we just did our launch uh last Wednesday. So, we did like couple of videos uh went into like the mountains to film like this this nature documentary.
We got to show the video. So, let's let Abby, can you pull that up? I'll pull it up. I'll show the video. I can talk like the
reason why we did that is because like reinforcement learning as a concept has been inspired from nature from like the classic dopamine like oh you did something good versus you touch a stove you do something bad. Um yeah so we decided like kind of do that and then you know using goats to climb hills kind of like a double meaning there. Um yeah
went well I think we got like a lot of like interest and inbound from the launch and um yeah been working on deployments right now. So, like very very busy, but I guess it's a good thing, right? Let's inspect this amazing video, which I laughed so hard. Yeah. Yeah, it was funny. Full screen that
in nature, intelligence isn't static. Every step and every hill climbed is shaped by the world around us. And by learning from experience, we eventually succeed. We want to do the
same for AI. Hi, I'm Casey, co-founder of Osmosis. We're building the first endto-end reinforcement learning platform that helps AI agents learn in real time.
2025 was supposed to be the year of the AI agent. Instead, companies realize that while powerful, AI agents still struggle when it comes to consistency and efficiency. Osmosis helps companies fine-tune open source models that can outperform foundation models at a fraction of the cost and latency. Osmosis is purpose-built for agentic use cases like tool use and code generation.
We integrate directly with your tools and offer a streamlined way for developers to spin up reinforcement fine-tuning runs. You can also set rules to enable continuous model updates. And best yet, we offer direct model serving and the ability to export model weights after training. It's simple. Your data,
your weights. That's such a fire. At osmosis, our vision is to create a future where AI agents are able to self-improve. So, just like in nature, today's experiences will shape
tomorrow's outcomes. Nice. Well done. Well done, Andy. Well done. Thank you.
I think it's like if if you know you we know you too. So when you guys like come into frame for the first time. I was like oh dude. So yeah we know these people.
Yeah, you know like we we've went on we we've hung out enough where it's just like yeah we know you personally but it it's it's funny to see it in in video because Oh there were so many bugs. Uh when we were filming we took so many takes because they would like bugs will keep flying on our face. Yeah, many out takes that that I don't know if we'll do a behind the scenes. There's like bugs on his face.
No, literally I think like like a bug almost like flew into my mouth while I was talking and I had like that take was also really good. Like I didn't mess up at all or anything, but we had to retake it because the bug was there. Yeah, I think it was very tastefully done though. Yeah, thank you. Thank you. Yeah. Yeah. It was nice cuz I would say it's well
done, but it wasn't so over the top. I've seen some launch videos lately that which, you know, I think everyone should launch their own way. If if you're a kind of company that wants to do over the top, you should really do over the top. But my like our style is definitely
not that. And I think you you had like a nice tasteful uh like it's really well done, but not so uh it's not too over the top. you know, we're not looking to do like whole skits or movie productions. Like that's
overkill for what we need to do. Just like get our name out there, like make some clever like references and and that's about it. Yeah. Yeah. And there's enough like Yeah. enough of like the hill climb goats, you
know? I I just think for people in the industry, they they can respect that. It's like there's some e Easter eggs in there, but it's, you know, it's still just kind of like tastefully done. Yeah. Yeah. Thank you. Okay,
dude. So before we get into like how crazy RL is hyping up. I think one point that justifies that Sabin is you you guys just threw RLIRL. Yes. Users a recap of that.
Oh my gosh. Yeah. We were amazed by the um turnout. So we did this in collaboration with YC and a couple of
other companies that was interested or doing RL like Reptile Encord um yeah like Reductto. So these companies are also interested in co-hosting with us and then we brought on our our angel investors. So Shangu a senior staff researcher at Deep Mind to do like a main keynote talk. Um YC had a hard
limit of 200 people that were allowed to attend the event. We were kind of aiming for like three to 400 signups. Ended up getting like over 800 signups. So that was like a pretty insane turnout. Um and
yeah, like we had um we had some pretty amazing people show up. So yeah, it was a one day event. We talked a little bit about like building RL environments because that's also heating up right now. Um I think for folks who are mostly
like software engineers and builders and not necessarily like very experienced in like ML or reinforcement learning, they can get still get started and be involved by you know building these bespoke environments and quite frankly the labs will pay a lot of money for them. So that's kind of like a new industry that's popping up. Um and then
for us like we're helping companies do end toend deployments. So for us it's like a lot more involved like not just like providing you environment be done but we provide like the whole inference training like collection stack but yeah it's been something that like I think from the beginning of the year we started on on working on RL everybody is
just kind of like oh what is this how does it compare to supervised fine tuning why do we need this still like in a more understanding phase right and now everybody is like oh like this is the key to unlock like superhuman or like perform better than human level intelligence Um I think a really good example of classical RL is on the um
like the alpha go paper or like Google deep minds like go player um but it showed like RL can scale to beyond human level performance and now we're taking the same strategy and approach to language models and like getting um agentic tasks to scale on the environments that people are building right now.
So if so people have drank the Kool-Aid. Is that what you're saying? I think so. I think I think people are
drinking the Kool-Aid. And I I have mixed feelings. I think I think it's good that people are caring about RL, but I think it's it's um also a time to take a step back and look at the state of the field and industry and see like what can we do with RL, what can we do with RL and just be more realistic, right? Be more grounded about like what this technology can actually let us do,
right? I think there is a school of thought of people where um people think that reinforcement learning is the key to AGI. Uh I will personally I I don't think that is the case but I do believe reinforcement learning can and is unlocking a lot of uh like potential that these models are having right now.
So it is definitely still something that is worthwhile exploring. So I I have a question of some I think we have audio issues. Yeah. Okay. Obby, I'm looking at you. Was it you?
I don't know. I don't know. All right. So, where's temporary difficulties? Uh, no. So, I got a
question. So you said, you know, people had the realization that, you know, trying to figure out the differences between RL and supervised fine-tuning in as simple of terms as possible for those of us, you know, who maybe don't understand deeply these terms. What what's the difference?
Yeah. So supervised fine-tuning as as it's like SFT standard, um it's a teaching the model to teaching the model to learn a certain behavior based on a predefined golden data set, i.e. I will tell you that 1 plus 1 equals 2
and your job as a model is to remember that. So next time when I ask you what 1 plus 1 is, you should tell me two. Uh reinforcement learning is a reward-based mechanism in a sense that I will not tell you what oneplus 1 is but I will give you a calculator and then it is your job to push buttons on that calculator in a certain way so that you
can tell me that the answer is two. And if you get the answer, I'll give you a positive reward. meaning like I give you like a piece of candy that you did great and if you don't get the right answer you don't get that candy right so the model is incentivized to try different strategies until you can figure out that 1 plus 1 equals two now why is this a
powerful technique is because in supervised fine-tuning um if the if I then ask you what is 2 plus two you might not know that is four because all you did was remember that oh 1 plus 1 equals two easy but in reinforcement learning you learn the strategy of here is the number that I should punch into this calculator to tell me what the
correct answer is. So that is the biggest difference between reinforcement learning and supervised fine-tuning. Um in a more like holistic way, reinforcement learning is a very classical method in machine learning where you're minimizing a a loss function, right? And reinforcement learning is honestly like a very
different from traditional machine learning um because it is trying to maximize some kind of reward in RL. Does that mean like you're optimizing to call tools better? Yes, exactly. So that also depends on
something called a reward function. Um let's say that I am a master agent and I am trying to call some kind of uh scheduling tool, right? And maybe there is five different scheduling tools. I can schedule a zoom call. I can schedule Google calendar. I can schedule a
doctor's appointment. What is the correct tool to call? That is what reinforcement learning will try to teach. Right? If I ask the user like, hey, schedule my doctor's appointment for next month at like 3 p.m. Uh, and
let's say that in one of the traces that the model decided to schedule a Zoom meeting with a doctor, right, at 3 p.m. He kind of got it, but that's not really what we need. But let's say in another
trace, the the uh model chose to call the schedule doctor's appointment tool. Then we reinforce that trace. So every single time when a user asks schedule doctor's appointment, we know to call that specific tool. Mhm. How many traces
does it take to become like more efficient? Yeah. So, reinforcement learning, the nice thing about it is that it can be quite sample efficient in a way that you only need a couple hundred of really high quality traces to get started and see some results. Now for production use cases, the real answer is your use case
may vary, but typically we see data traces about 500 to a,000 to get started and then we will scale up to maybe 10,000 and beyond if we want like more diverse and more robust uh model behavior. Great. Your product's looking better too from that demo. Yeah, thank you. We we gone through a lot of iterations. So for someone that
wants to do something like this on their own, can you what's the process for someone to take take one of these open source models and do reinforcement learning on it? What what tools did they need? How would they even go about getting started? I'm assuming that's at
least some of the what you talked about in RL, you know. Yeah, exactly. Yeah. So, can you tell us a little bit about it? I'm just you personally curious, but also I think a lot of other people are probably wondering like
what's what's it take to get started with some of this stuff? For sure. Yeah. I can talk about like
the way that our platform is built because we help to handle a lot of this um hard work under the hood. So what we still require from users is tools, right? So how do you define the tools and the code for those tools because we need to run them um on our environment in our sandboxes. It could be a situation where you only expose like a API endpoint to us but you will still
need to harness to call the API endpoint if you don't want to share the code with us. Um the other thing that you would require is a data set right data set is things like what is the system prompt, what is the user prompt and what is the expected ground truth. The ground truth here can be me uh a variety of meanings,
right? It can be either let's say um I'm training my agent to be really good at using notion. Then it could be what the notion database should look like after the agent is done with his task, right?
If I want my uh agent to be really good at like handling customer support tickets, then the ground truth should be well what was the resolution action that you know the model should be making right and then in the training process you will have to specify what tool you want enabled um what data set you have
and select the model size. Typically for model size we suggest people start off with something around the 8 to 14 billion parameter range because open source models today even at small levels can get really really good with reinforcement learning. If there is a requirement let's say your use case is very complex has multiple stages um and have multiple like different sub aents
that work together maybe it it'll be worth it to use a larger model like a 32B a 70B or beyond. Um right now like our platform can support anything 7 dB or below. Uh if you want to do like bespoke tuning for 235 or 48 or even the one trillion parameter model that'll be a more custom engagement for with us. Yeah.
So that's why you were saying environments are really important because if you're trying to run MRA agents that have tools, you would need to initialize Node.js and then actually execute. And this is not coming from an MCP server. It's just something I wrote myself. Right. Exactly. Yeah. You need the harness to allow your model to
interact with the real world. And the goal here is to try to get your training environment to be as close as a match to the production environment so that your model will not be uh experiencing data that it's not like seeing in training. Interesting. That's super cool. Um how's are you self- serving this now or are
you still you know embedding yourself in teams and So right now we're still very forward deployed and the the catch here is that we're mostly using like our platform right now to help people deploy models so we can dock food like the platform. We are making changes to it like every single day and we hope to have a self ready by quarter one of next
year. Dang. Yeah. I mean, well, when you said that 800 people tried to get in to the event
and you only allowed 200 in, me as a as a my product managering hat comes on and says, "Dang, there's some demand there." So, are you running another one in the future? Like, how are you what what what do those 600 people do now that they didn't get in? They weren't one of the the chosen few to get into the event.
Yeah. So, we have recorded all of our keynotes, our breakout sessions. So, they they'll be available online. Um, we
will definitely look to do another one probably not like in the next weeks or or the upcoming like month or two, but eventually like we'll definitely do more events. We'll also do more RL breeding groups as well. So, in those events, it's geared towards more like researchers and people that like you know do this for their job. But we read
some re uh research papers on like latest techniques in RL what's helpful. We share some notes. Um and in the conferences that we do is more geared towards application companies or people that are developing agents. Yeah. But
we'll definitely do more. Yeah, that makes a lot of sense. I think you h having if if you there's that many people that are interested holding the events one so people can learn but two so you can meet people that are interested. It seems like a a great way to uh to keep keep growing the
interest in RL and I think it's it's here to stay for quite a while. Just very very hot topic. I hear about it all the time now. Yeah. Yeah. It's crazy, right? I feel like this is like a few months ago. Nobody
except me was yapping about it and Yeah. We went on those walks and you were telling me all about it. I was like, "This sounds cool, but you know, I'm only hearing about it from you." And then but that just means you were ahead
of the game. You you were you were talking about it before everyone else was and now everyone you know everyone's talking about it. Yeah. I'll put on the I was here before it was cool hat if that means anything dude teach all
these schools. But at what point let's say I build my product I haven't done Yep. Zero nothing. At what point do you suggest me coming to you
to like you know start doing for agent for new agent builders? Our advice is always to use the best models from foundational labs, right? From uh OpenAI, from Anthropic, from Gemini. Um
the reasoning for that is because once you really train your own model, your model will be expecting almost the same structure every single time or else you will have to retrain, right? That's the benefit of closed source models is because they're so large. Um they can practically handle any environment. Okay? Right? But that's also a
limitation is because they're slow, they're expensive, and they can only handle any environment. Okay? Right? So
once you have enough traffic, I would say like upwards to thousands or even tens of thousands of user user sessions or user requests per day, that's when it will make more sense for you to scale using your own model. The benefit of using your own model is also quite clear. Typically, we see engagements where companies can compress down GPT5
to a billion parameter model. This usually yields up to 15x in speed improvement and also it's only costing them about like five to 10% of the cost. And typically performance is about the same if not even more stable because we train the model to be really really good at that company's agent environment itself.
I know what company you're talking about. I'm just gonna keep it to myself but uh are you are you publicizing who you're working with? So for some companies we are doing case studies that will be coming out and some companies like in the agreement we are not going to discuss unfortunately. Yeah. Well I mean I I've seen some things
shared on on X from people in other companies. So I think there's at least some that I've seen that are public. But that that's cool. I'll look forward to
seeing some of the case studies. Of course. Yeah. And and I'm sure everyone listening is
is going to now keep their eyes uh keep their eyes out. Um, awesome. Well, let's see here. Andy, if you anyone
wants to follow Andy, you can do so there. Check out osmosis.ai. If Andy's described things that are of
interest to you. Anything else to close out, Andy? Um, yeah. I think this um this is
becoming a really exciting market very quickly and I would love to see how RL turns out for 2026. I think 2025 like originally I think it is still the year of the agents but very very quickly people realized once agents are working at scale and working at production they may want to own everything in house and in that point RL becomes a really attractive solution. So yeah, if you're
building agents, if you have a lot of traffic and you're seeing things like there's too slow, you're not controlling the quality, um you're kind of doing whack-a-ole by just adjusting the prompts on different edge cases. Yeah, definitely talk to us. Awesome. And let us know if you need a master environment.
Yeah. Yeah, definitely. All right. Thanks, Andy. We'll see you later.
See you prepar. Take care. What a doozy that was. Yeah, school was
in session. School was in session. I don't know if I'm pass the test, dude. I'm not gonna pass this test, but you know, I was here. I attended. Do I get
attendance credit? No, but at least you'll get like a D at least. Yeah. Yeah. Yeah. At least passing. Uh,
awesome, dude. This was This is a long show. Yeah, it was really good. Yeah. Um, so much information.
Tons of great guests. I guess we can play one of our outro. I know. I was going to just say like find an you pick one of the outro songs.
You know, you got everyone's got their copy of the book, maestro.aibook if you don't have your copy. And we will be back again next week.
We're going to do it in person next week. Yeah, I think so. So, next week, just housekeeping, it's going to be a little later because I think we're going to have to do it like an hour later next week. So, because I I think I land right
around this time. So, you know, we've been kind of moving things around for travel schedules and we will still do it on Monday. Might be a little later in the afternoon, but maybe we'll have some others that can join us for the show as well. Yeah, we'll probably record some content at the at the conference and post later.
Yeah, we we have a bunch of people from the Monster team that are traveling in too. So, it's going to be a fun week. Yeah.
So, for those of you that are not following us, you know, please do. You're going to see a lot of cool uh fun content over the next week. And if you want to follow us, you can do so on X right there. SM Thomas 3 or Abby Ayer OnX. And yeah, you want to send us off?
Sending Sending us off. Uh yeah, I know you It's not I can't hear it. You have to share your screen. Okay.
technical difficulties. We'll get better at it. Yeah, we got to learn. There's there's probably a way just to share some audio.
Yeah. Thanks for tuning in. Peace ages. Yeah. Master.
I'll be in shade. Signing out now, baby. Every week we drop the knowledge by agents wild and free from automation to pure strategy. That's where you need to be.
All right, everyone. Have a great week. We'll see you next time.



