Back to all episodes

Mastra fundraise, Is Lovable Dying? Superagent, Ragie, and more AI news

October 14, 2025

Today we have guests from Superagent and Ragie. We highlight the Mastra fundraise, talk about Lovable, and give our take on AI music. Finally we answer the question... is Abhi a Vibe Engineer?

Guests in this episode

Ismail Pelaseyed

Ismail Pelaseyed

Matt Kauffman

Matt Kauffman

Watch on

YouTube Spotify

Listen on

Episode Transcript

4:11

Hello everyone and welcome to AI Agents Hour. My name is Shane. I'm here with Obby. What's up guys? And as always, we're coming at you, well, not as not as

4:23

always, but most of the time we come at you on Mondays, noon Pacific time. And that's what we're doing today. We're going to be talking to a few guests. We're going to be talking some AI news. We're gonna be talking about, you know,

4:35

if Oby's a Vibe engineer or not, and among other things. We have a lot of cool stuff to talk about today. But how you doing, dude? Pretty good, man. It was a a wild weekend here in San Francisco, but back

4:48

at HQ right now for today's stream. Um, last week was tech week, which honestly is just an excuse for everyone to party every day and uh, we did. So, it was fun. Yeah, it was a good week. I was there most of the week and I'm glad was glad

5:04

to leave. Oh, man. I wish I could leave right now.

5:10

But it was a good week. It was a good week. Yeah. Uh, yeah. I guess first of all, I'm not

5:16

in my normal location. How's my audio coming through today? Pretty good. I can hear you. Good. Okay. Good. Yeah. It's a holiday

5:23

today, so the kids are not at daycare, so they are somewhere. So, I had to leave. I couldn't stay in my normal office or you we would have little kids uh coming in during the stream. And so, I figured I would go to a different

5:33

location today. We have a bunch of stuff to talk to talk to talk about in today's stream. So, I'm super stoked about it. There was a lot of drama over the weekend just in

5:45

general about many different things. So, um yeah, it's a good day. Well, if there's one thing that we're good at, it is stoking the flames of drama. Yes.

5:57

At least bringing it up. Yeah. Yeah. You know, we, you know, most of the time, you know, drama just, you

6:02

know, goes away, but we like to just remind everyone that it happened. So, if you missed it, in case you missed it, we will keep you up to date with uh some of the drama that we've seen. And I mean, little do people know, this is a reality TV show in reality, so we bring it all here. Yeah. Yeah. We we don't we don't hold

6:19

anything back. Uh you know, and Ruben is letting everyone know that this is a live show. So, if you are watching this live on YouTube, on X, on LinkedIn, just comment if you have questions. We'll try to answer some of them along the way when we can. If you aren't listening to

6:38

this live, you might be watching it after the fact or listening it after the fact on Spotify or Apple Podcasts. And Ruben does say he's only here Ruben's only here for the drama as well. So, you know, good. We will try to bring it today. Uh, but if you are listening to this after the

6:56

fact on either Apple Podcast or Spotify, if you want to give a fivestar rating, please give us a review. If you want to give us a four star or below, find something else to do. Please uh don't give us rating. But we do appreciate

7:07

five star ratings because it does help other people learn about uh the drama we talk about and hopefully some of the useful stuff as well. Yeah. Yeah. Should what do you want to talk about first?

7:20

Uh some random stuff uh first. So, over on on Friday, I hung out with our homie Sherwood, and we recorded a Twitter space called The Agent Boys with the Z, and we were I don't know, like we probably won't do that same format because we were like getting drunk, but there was a lot of uh just good talking in there. Um, and so I think we

7:44

might do another episode in the future. But if y'all want to check that out and hear me talk a bunch of about graph theory and random stuff like that, check it out. I could post the link. Um, but

7:55

yeah, other than that, same old I I saw that come up. I have not listened to it yet, but I knew what you were going to say about it because I I just knew what we were what you were getting into. So, yeah, now I have to listen. Yeah, we we talked decent amount of uh

8:14

but it was also pretty good. They asked about like some uh an engineer from Replet came on and we chatted with him and some other like founder type people came on. So, it was actually not too bad given that it was like 9:00 or something and who would want to listen to that uh in general, but I guess people do.

8:36

Yeah. You know, everyone has different schedules. You never know when someone's just put the kids to bed and that's all they need in their night is, you know, listen to some agent boys. Yeah. Well, I mean, even for this show,

8:48

like you never know who's watching. And, you know, we met some people last week that were like, "Oh, we know you from the stream." And we were like, "Wow, that's cool." Because we don't really know outside of the people chatting here

9:00

live, we don't necessarily know who's watching the show. And um so if you are watching take part in it live and that'll gives us some uh confidence to keep going. Yeah. All right. Uh we did have some announcements of our own I guess this

9:19

week. So, if you didn't see last week, I believe it was on Thursday, we announced, you know, MRA announced that we had closed our seed round. So, we'd raised some money so we can keep bringing more MRA to you all. So, that went, you know, was very wellreceived,

9:37

but we did have, you know, it was a nice uh kind of fun video that was included. And Obby, you said some some choice comments that resonated well well or maybe not so well with people. It's definitely for some people they did not like it. So yeah, let's uh so this was kind of

9:56

Let me pull it up here. Uh I thought this was very funny. And then of course for those that didn't see the video, maybe you can all uh find it find Sam's post on Twitter or or HML as well, you know, like whatever clip that The quote tweet here was uh the ecosystem was polluted with Python.

10:25

And that's in reference to uh a quote that Obby says in the uh in the video itself. I'm not going to play the whole video, but I think we should become a book publishing company. nah from and after with reactive developer build agent different we were trying to build an agent for a different type of idea and we just found that the whole ecosystem was polluted with Python

10:55

so that that's the reference there uh for those of you that uh didn't see the video but go watch it you know leave us a comment polluted with Python share uh but the funny thing is that's actually you Partially the reason we created Maestra is that it's not that like I I used to write some Python code like I think we all probably wrote Python. I really just didn't want to go back to writing more

11:20

Python. I thought like what is the what is the actual point? So I don't know that was one of the reasons I think that Mafra came to exist is that we didn't want to go back to writing Python. Yeah. And the dev dev tooling sucked at

11:33

the time and so we wanted to change that as well. Yeah. Especially for us, the tool the tooling was not wasn't very good.

11:40

Out of that whole video though, people were like polluted with Python and there are a bunch of Python maxis who are like, "Oh, these guys their JavaScript." It's like, yo, just chill out, dude. It's just a joke. By the way, that's

11:53

another thing. It's just a joke. Like, I don't hate anybody. It's just a joke. Like, but people go on the offensive and

11:59

stuff and I mean, I knew what I was getting myself into, so I don't really care. But yeah, I mean, sometimes you uh you say something because you know it's going to get a reaction from some people but ultimately it's just to kind of like have you know you should be able to poke fun at each other a little bit right like yeah us TypeScript guys should be able to

12:16

poke fun at the Python guys and they can poke fun back too right you know yeah it's all love it's yeah it should be it should be we should be able to have some fun here so yeah it wasn't you know we don't hate Python but you know we are going to poke fun at Python a little bit yeah I guess you know they say talk get hit so that's is

12:42

hopefully not. Um so one more thing before we bring out our first guest and then maybe some of the other stuff we can pull out at the end. Um I found this interesting post from uh Simon Willis who I think you know kind of came up with vibe coding right and is now talking about vibe engineering. So the difference between vibe coding and vibe engineering is that you know

13:06

vibe engineering is a little bit more sophisticated has a little bit more structure. It's done maybe by an engineer in a certain way to produce a certain result. So there's a whole blog post on what Simon Wilson defines as vibe engineering. So we're not going to go through all that because we do have a tight schedule. But what's your opinion

13:25

Obby? Is this one gonna stick? Is this one gonna stick like vibe coding did? I don't think so. Um, well, okay, here's

13:33

the problem. Like, anytime there's a new term, people are going to try to make money off of it. So, I I assume that there will be companies that uh will get on the Vibe engineering, like your product is a vibe engineering stack or something like that. But, I just don't understand how this is different than

13:49

Vibe coding or like, you know, maybe vibe coding's played out so you need to have a different term or something. So, I'm not sure, but he does, you know, he has produced a lot of industry terms, so maybe that will stick based on who the author is. Um, but if we said Vibe Engineering, it definitely wouldn't have a chance of sticking. Yeah. Yeah. I mostly agree with that. I think uh VI vibe coding kind of had a

14:16

moment and now maybe this is this will have a moment, but it's not going to be the same. You know, it's I do think there is a difference. You know, when people say vibe coding, they it almost makes me feel like they're just uh just letting the the LLM write all the code without any intervention, which I don't think is actually the

14:35

case. I think this is just arguing that maybe there needs to be a term for LLM enhanced engineering and rather than just like letting the LLM run. But yeah, I think I think it was a fun read. I

14:48

would recommend giving it a read. I don't know if it's going to stick. I kind of doubt it, but we we'll see.

14:54

We'll be here. Coding is for noobs and vibe engineering is for seasoned vets, right? But like I don't know what even internally when we vibe stuff, we don't say the coding or engineering part. We just said, "Oh yeah, I vibed this real quick because

15:05

I'm already an engineer. I don't need to explain what I'm doing, you know?" Um, and at the end of the day, you're still putting your name behind it, you You know, I guess that's maybe the difference of using some of these vibe coding platforms where you're shipping prototypes really quickly and maybe you do make some of them get into

15:22

production. But if you're on a actual team and you're using an LLM to help you write code, ultimately you're still responsible for that code being written at some level. So your name's on it.

15:34

So I don't think it really matters. Yeah, same. Anyways, fun aside. And with that we are

15:42

going to bring on our first guest. So we have Ismail from Super Agent. Welcome. Hey guys. How are you doing?

15:55

Good. How are you? I'm fine, thank you. Thank you for having me on. Big fan of MRA. Of course,

16:02

as a OG agent builder myself, I I I I definitely, you know, uh like the TypeScript first uh approach to things. So, happy to be on, happy to chat with you guys. Yeah, same here. Um I really love y the work y'all are doing because oftent

16:23

times and I guess actually maybe I should stop and let you introduce yourself and what your company does and then I'll fanboy after that. Yeah. Uh, so I'm Ismile. I'm the CTO of Super Agent, CTO and co-founder. Uh,

16:38

super agent is basically small language models that defend your AI agents from prompt injections, jailbreaks, you know, leaking sensitive data, all of that nasty stuff that can happen when you connect an LLM to your data sources and want to keep that LLM like safe, secure, and uh, yeah, so that's what we do. Our core thesis is like uh you know

17:08

as code you know the amount of code that is going to be generated the amount of apps that are going to be generated by AI is going to skyrocket that's obvious for everyone right uh the current like security solutions the legacy security solutions for those uh kind of apps uh needs to change. So the reason for that is that you know we're we're going from

17:35

a very deterministic like way of building products to a more probabilistic way and that you know opens up a whole u you know can of worms when it comes to security issues as you know you mentioned Simon u earlier uh in the stream he has a bunch of posts on just jailbreaks that he does right uh so that's a great example of what we are

18:00

trying to you know uh help u builders achieve like a security for their AI agent basically. So like uh we we talked to a bunch of customers that are getting close to production with their AI agents and they ask us like what is like one most of them want a silver bullet for security right they want to they want something they can install but they

18:27

you know they don't want to do the work let's say to to cover everything. Maybe that's because you know old security was just having a badge on your website let's just say but nowadays you got to do a bunch of So like what would be the minimal thing a user has to do to start becoming secure? I think the minimum thing you need to do

18:47

is to uh you know try to separate out different data sources from different agents. Try to compartmentalize your whole workflow uh and you know depending on what you're trying to build a lot of people are building coding agents now that run a sync you know workflows in the background. I think the first thing that's important there is, you know, sandbox that environment. Make sure that

19:12

it only has access to uh stuff that it act absolutely needs access to and make sure to, you know, cover u observability so that you have insights into what is going on. I think that is the most important part uh for somebody that's just starting starting off. Um um so I would say like sandboxing is super important. I would say you know comp compartmentalizing

19:40

your app is super super important when it comes to to security in general. And then what's the next level from there? I think the next level is you know uh what we've seen like the the reason why we built super agent is quite fascinating. We were building something

19:58

completely different. Basically, we were doing like a rag app for compliance teams so they could pull in all of their compliance stuff and then they could ask questions, right? So we got this enterprise deal and you know the CIO was you know wanted to implement it and then we got to you know all of the nitty-gritty compliance details and you

20:20

know in Europe where we are based right now uh you know AI regulation is much much harder to pass by than than than in the US. So we had a lot of stuff that we needed to do you know we needed to get you know accredited audited a bunch of stuff and then when we had all of the stuff you know the sock 2 everything was in place we started getting questions

20:45

about like the AI agents themselves stuff that's not covered by regular like sock 2 uh you know data trust centers right so that's where we we we thought that you know uh the idea is to help u founders close more business by giving their customers, their users transparency into what these agents can do, what models they run on, you know, are they training on your data, what tools do they have access to, how are

21:16

they connected to your tools, you know, is this thing secure or not. So I think that's the next step you know uh getting uh being very transparent uh and and adding these uh you know guard rails that prevent from prompt injections or leaking data. Um you know that would be like the next step I think for when

21:38

you're out there trying to close a customer that will be the thing that the CIO will say you know yes or no to. Yeah, there are a lot of security frameworks that are coming out uh especially in America like the NIST has one which is like a you know manage whatever observe whatever like but then there's other ones too that I guess will all coales into some cert certification that could put on your

22:07

company right but then I guess each of those building blocks are still what you described you still have to do the work in each of those steps right yeah you have to do the work. And the the the the the sad part with uh you know security is that you only care usually like teams that we talk to, they only care about security when they're

22:27

trying to close a deal. Yeah. You know that's that's that's really like that's why they are doing it.

22:33

They're not doing it just for security's sake. They don't care about security. They just want to close the next deal.

22:39

They just want to grow like everybody like us, like you guys, right? Yeah. Uh so security becomes this thing that you do when you need to to to show someone something right hey we got this thing and that's fine that's all great you know uh our idea is like you know we have to understand what people you know why people want security and then build something that makes it super simple for

23:04

them to get you know a high level of guard rails a high baseline implemented in three to four minutes. That's like what we are trying to do. So, we have a very simple SDK. It runs both in

23:18

Typescript and Python for you guys that that still love Python, right? Yeah. I can't imagine there's a lot of Python people listening, but if there are, it's good to know that they're supported. Exactly. Exactly. And you can just plug

23:30

it in like, you know, like MRA or a, you know, we we we did a example with your input processors. Just works out of the box. It's a model that that keeps your stuff safe, you know, and u so I think that uh it is as you say something that you need to do continuously but I don't think people do I think they do it once or twice a year when when the sock 2 review is uh you know coming and and and

23:56

and that's about it. So, we we're trying to like build something that people, you know, uh we don't want to we don't want to like, you know, um put a bunch of burden on people. They don't need that.

24:10

They just need something that actually works with the way that they want to work with security. Um so, we're trying to build something like that. Yeah. Should we break down how super a like different components of super

24:22

agent? Because for me, like what I love about Super Agent is it exactly what you're saying. It like doesn't make it like a burden. It makes it like because it's a really easy integration. Then you

24:34

don't have to think about it that much, right? You just do it and then you can expand it over time. But yeah, the floor is yours. Like let's get into super

24:41

agent. I'm going to share my screen. Like I'm just going to show you the easiest uh easiest way to get started. Uh you know

24:48

uh what you do is that you go to like superagent.sh. You create an API key and then inside of your app you install the super agent SDK uh and then you spin up a client and then you can decide what type of security you want. So here we have two different examples. We have something

25:08

that we call guard which basically helps uh or keeps your agent safe from prompt injections, back doors and that kind of thing. And then we have uh another method called redact which basically takes a uh a string whatever string it might be something that's generated by an AI might be a human generated string as well and then you can basically pass

25:32

in the rules you want you know I want to get rid of PII PHI and you know secrets and API keys. So you can configure that very easily in this SDK and then just run it and then you get a a purified version back basically as you see in the example down here. So it's super simple to get started. It's completely open source. It hooks into all like agent

25:57

frameworks that are worth mentioning at this point. uh you know it it hooks up to your coding agent you know AI SDKs that are out there or other gateways that you might be using. So it's super simple to get started with um so you can just spin up the SDK. Uh if you're like a power user and want to run a full

26:19

proxy on all of your AI, you know, traffic, you can do that as well. So we have a proxy endpoint that you can just connect and and use instead of your open AI like API base URL um and and uh you know get that rolling pretty quickly as well. Uh and all of that is basically powered by a bunch of small language models that we have trained uh what we call reinforcement

26:47

fine-tuning which is a which is a you know different from the classical like fine-tuning that people do and we've trained these models to be highly effective in stopping um these kind of attacks and since the models are really small you can train them almost in real time. That means that we can, you know, update the model's weights every day. So

27:13

if new attacks appear, we can train for those attacks and we can make sure that those are those weights are pushed to to our users, you know, more or less in real time. Uh, which is a huge thing I think in in security. It's like the good old days when you have the McAfee like badge. Uh I don't know if you guys remember that but you know if that badge isn't updated for two years

27:39

that badge doesn't mean anything right so we're trying to keep it like real time as much as possible uh and that that's basically our core so we do the models and then we try to you know provide um integration points which make sense for users and yeah and we did have a question from the chat which I think you kind of already answered, but I'm going to state it

28:04

anyways. Uh, so Ruben did ask, how is this different than prompt engineering and input sanitation? And I think you kind of got into it a little bit around the it's it's more real time, right? That the techniques that you use today might need to change as new

28:18

exploits come out, but is there more is there more to it than that? So I think that 99% of uh agents have their security baked into the prompt, the system prompt. So the system prompt becomes like the security layer for the whole agent. The problem is that uh these attacks are so

28:39

uh you know devious in a sense that they can implant as soon as they reach your model it's too late. So if an attack reaches the end point where you're running your open AAI agent it's already too late. If that model sees those tokens there are a bunch of studies around how they internalize data. Maybe it's it gets saved to your memory bank

29:02

that you have you know separately and then they can exploit it later on. So what we're trying to do and I think what you need to do and I think what we also saw from OpenAI having the same direction with their agent builder is that you need models that are outside that act like outside of the main agent

29:24

model that you have something that you know you can pollute but doesn't actually pollute the agent, right? So it it's like a filter or a condom if you will. uh for for your AI agent. And I

29:38

think that's um that's the way the entire industry handles safety right now. And I think that's a good approach. They call it the dual model uh like setup. So you have one main model and then you have safety models and a bunch

29:52

of different other smaller models that that work around that agent. Practice safe agents, you know. Exactly. Exactly. SLM That's exactly right. SLM architectures

30:05

I feel are not not necessarily like mainstream right now, but they are used for safety or even object generation and like these smaller tasks and I just think that's going to put that on my bingo card for 2026 that SLMs will take off more. I think I think people need to realize how you can leverage them and

30:23

not everyone knows how to train them either. So that might be a very interesting space and I think I think the problem is that uh you know there are a bunch of awesome small language models out there that are open- source and release. The problem is that if you learn how to fine-tune them which which is an art in itself because

30:43

the data you would need to fine-tune for a specific task, it's pretty hard to generate that data, right? Yeah. So that's where your mode is for your model really. uh and and and uh you know

30:56

so uh I think that if you if you're able to train it the problem then becomes where do I run it? Yeah. Like that's a problem as well. Running these fine-tuned models isn't as

31:09

straightforward as one might think. So I don't think that like the whole I I I think that the whole not only small language models but also the whole fine-tuning ecosystem will blossom the coming years because people would need to train these models for different tasks and run them on some data center

31:28

100%. 100%. And as a followup Ruben asks so the input goes through super agent before even reaching the agent we build in MRA.

31:39

Correct. So if you add the guardrail, if you add that JavaScript TypeScript snippet, what it does is that it it the prompt or the the cont tool call, it might be, you know, a function call, it might be an input prompt, it might be any type of input output, you pass it through super agent, it will basically give you back depending on what method you use. Is this thing secure? Why isn't

32:04

it secure? What what security rules does it break? And if if that happens then you can you know u you know trigger some other workflow in your app and if it's safe you just pass it on to your model.

32:17

So it goes through our model first and again I have to mention the models are open source everything is on hugging face the data set is on hugging face the different model sizes and weights are on hugging face. So if you don't trust us you can just spin up your own and and run it wherever you want. Really

32:35

that's awesome. That's cool. Yeah. Super cool. Thank you. And I'm glad you guys integrate with us.

32:41

That's definitely Yeah. Yeah. And I think, you know, one thing that's missing in this whole like AI agent framework thing is uh I think one thing that you you've done really well is the hooks that you have on inputs, outputs, you know, tools and all of that. I think that's a huge thing. And I think that more people will leverage those hooks as uh you know

33:04

these agents diffuse into you know organizations all over the world. I think that that will be that's something that I really like with Mastra the whole like it feels like a concept more than a framework. Yeah. and and uh I I appreciate that more than because frameworks come and

33:22

go, you know, but concepts they stick around for for more more time and and become more like pervasive in workflows and stuff like that. So yeah, I I really enjoy master. Happy to to work with that. Awesome. We have a some questions as well. Some good ones here.

33:41

Really good ones. Let's do this latency one. So is there a noticeable latency with additional layers here? Of course. So there is latency because you put you know uh a model in front of

33:54

your original model and that's also a challenge that we have. Uh and that's also a reason why we use small language models because we can uh you know quanti quantitize them down to very small effective models that run on custom kernels or whatever type of um hosting that you would want for your uh you know u endpoint. So on our cloud we run it on

34:21

fireworks so we have their you know latency so it's really fast and adds around you know I would say depending on the size of the input uh and the size uh of the of the output as well um somewhere between 100 to 200 300 milliseconds uh on really large like objects or whatever you might want to read act or or guard against. So there is latency. Um and I'm hoping like you

34:54

know I'm hoping that we will see even stronger smaller models that will run in the edge without any GPUs that are super fast. Um I I think that that day will come and then latency won't be that big of an issue. Awesome. And then I think our last one here,

35:14

what are the model sizes used in production and what trade-offs did you make in choosing that scale? So the one we run in prod is a 20 billion parameter model. Uh and the reason why we chose that and not a smaller model because we used you know a 270 million parameter model. We use the three billion parameter model and the reason is that the latency and you know

35:38

the chart for latency and accuracy on the task that we trained for you know had a sweet spot at those at that 20 billion parameter model. Uh also we do uh you know we do reinforcement fine-tuning. So we incorporate chain of thought and a bunch of that kind of stuff. And that's still not there yet with these super small models that they

36:00

reason well. And I'm hoping that that's something that will change soon. I'm hoping like the next Gemma that that get gets released or the next FI that gets released will have you know baked in strong reasoning chain of thought capabilities and that but 20 billion I would say is a good model. I think you can run most workflows on a 20 billion

36:20

uh parameter model. uh the the thing where it gets you know uh tough is when you have these broad tasks like deep research or something like that then you might want something bigger but but for for regular like uh these kind of tasks security tasks or or other classification tasks I think that a 20 billion parameter model or even down to seven billion parameter would be um

36:45

quite sufficient. Awesome. And last comment here from this week. Great work. Nice. Thank you, brother. Thank you, brother. Yeah.

36:56

Yeah. We uh we uh we think it will be, you know, we we think that we can help other founders out to to close more deals and and get secure. Yeah, I'm super bullish on your product for sure. Thank you. Um I think that's all the time we have today, but if you ever want to come

37:14

back, if there's any new developments, you're always welcome. You're Of course. Thank you guys for having me.

37:19

Of course. Anytime. Thank you guys for having me. Yeah. Have a good day. See you, man. Yeah. Thanks for coming on.

37:25

Thank you. All right. Cool. That was sick, dude. That was sick.

37:31

That was some good questions. Uh, and so, thanks audience. This is interactive, so please ask questions. We

37:38

will do our best to answer them. And it makes for more engaging show because obviously we're okay at asking questions, but you know, it's nice to have some some help. Y'all have been asking some great questions, too.

37:50

Yeah. Before we bring the next guest on, I'm just thinking when we did input processors, we made our own like PII detection more as a reference than anything. But we were using, let's call it, the big models. So, it does add significant

38:03

latency. And we always thought, oh, if we had SLMs, we would use them, but we don't. But Super Agent does. So, I

38:10

highly recommend checking them out. They have a nice integration with us and it's probably the right way to do our PII detector than what we're doing. Maybe Yeah, I think there's, you know, it's one of those things if you want to get some basic level protection, you could pull our stuff off the shelf. It's

38:27

probably going to help, but it's kind of like, you know, what Ismail said, it, you know, it's not there are new threats that come out. It's not going to be necessarily kept up to date or catch everything. you know if it's static then eventually people will just find a way around that specific guard rail right so there is something around using a model

38:45

and using a service that is continually being updated and you know especially when things are changing maybe someday it becomes less uh then the threats become more known and so maybe static can work a little bit better I doubt it but maybe maybe but now things are changing so frequently I think you kind of want something that's kind of living and breathing a little bit in front of especially if you're you know dealing

39:09

with things that are heavy in compliance or regulation and you need that extra layer of you know security or piece of mind. also liked how he said that you know in the in the case of this like we introduce a concept right like the input processor like we don't necessarily have to be on the hook for everything that an

39:28

input processor could do our job was to introduce the concept you know um so there should be other companies or products that can hook into MRA at any point and you know use the concept to do actual production level so that was dope yeah and it's almost like we planned that Almost. Almost. We got or we dumb lucked our way into

39:50

it. Sometimes it's a mix of both. Yeah. All right. Uh we're gonna bring on our

39:56

next guest. So, this is, you know, we've been wanting to get someone from Raggy on for a long time because we we've uh we're close with with the people over there and so today we're we finally got them. So, we got Matt from Raggy coming on and yeah, we're gonna talk about what Raggy is and what they do. So, welcome

40:15

Matt. man. Good to meet you guys. Yeah, good to meet you, too. Nice to meet you.

40:22

Yeah. Um, so I know that we've done some stuff uh like did done some events together um in the past. Unfortunately, I'm on the East Coast. I wasn't able to attend, but I heard that it went pretty

40:34

well. Yeah. Yeah. Yeah. It's always fun uh throwing or drinking some beers and uh eating

40:39

some pizza with with the raggy crew. Yeah. Especially Bob. Yeah, Bob's fun to

40:46

hang out with for sure. Bob and I go way back actually. Yeah. Yeah. But I think this is probably a

40:52

good time. Uh maybe quick introduction to yourself and then can you tell us a little bit about Raggy because some of the audience probably knows but I'm sure a lot don't. Sure. Um my name is Matt Kaufman. Um I'm

41:03

one of the founding engineers here at Raggy. Uh and Raggy is uh Rag as a service for developers. Um I I think that we uh the founding team we had built rag a few times probably six times if you counted all the founders um had done it independently and it it seems like a fairly easy like you put some stuff into a vector database you create an embedding of a

41:28

query you see what matches and you can stand up a naive rag pretty quickly but it turns out that to get a rag that actually works well uh takes a lot more steps there there's the um extra raction side, there's a chunking side, there's uh as it turns out, just pure semantic search usually isn't enough. You also need to do something with uh maybe elastic search or some flavor of keyword

41:54

search um to get results that semantic might not have picked up. So we uh we just decided to build the rag pipeline and the rag retrieval service we wish we had. Um so that get getting rag in your AI app was as simple as like mpm install rag. And so that's that's kind of what where we started and what we're still pursuing.

42:21

Yeah, that's cool. Um, yeah, I think there is there's something about knowing that when you, you know, regardless of kind of the the documents or the content you're throwing at it, it's going to be making sensible choices for you. I know when I was starting uh just building rag, I was like chunk size, like I don't know, like just going to pick something

42:43

and I just picked something, right? And then I ended up that was what I and I guess this was just like a prototype, but that was what I ended up sticking with the whole time. I never went back and re-evaluated it and never questioned like am I getting the right level of accuracy and then if I wanted to compare now I have to do a ton of engineering work. So I think there is something

43:00

around just having the the ability to throw your content at it and know that it's going to make some sensible choices for you but also just give you that uh that API that you you can tap into when you need it. Yeah, I think that that's uh that's actually like an interesting problem, right? you think about chunking, you're

43:18

like, "Okay, can we keep ideas together? Can we try not to split on sentences?" And you get that all tuned for text. And then you get a table and that table's

43:30

large and it doesn't fit in a single uh embedding. So, how do you maintain the context of that table when it's split across multiple embeddings? So, we've got like a specialized table chunker.

43:40

We've got specialized data chunkers. um if something was an image, we try and uh chunk it with a caption if we can find a caption. So there's a lot of like you can go really deep on all these problems and um like we want to be the AI team for you that does that can just 100% focus on those problems so you can focus on you know what your product does.

44:05

like a year ago when we met Bob the first time we weren't even working on MRA and I think it was like the day after Saxs was like on his all-in podcast that mentioned you guys and I think that's a long time ago. It feels like a long ass time ago now. It's like 15 months probably. Yeah.

44:22

Yeah. It feels like ancient history now. I I I actually was uh responsible for getting like the the website up just in like the nick of time to have like a one up to learn more and I was like ah we'll get a few email addresses and we got hundreds. A lot. Yeah. And that was also during a time where like let's say rag was the the topic of

44:41

at hand for the AI community and then we had the rag is dead and then we had rag is not dead. Uh then we had the million context window event and now we have context engineering. As someone who's worked in rags since let's say the beginning, right? How do you feel about

45:00

all these things coming and flowing? Especially if your company is called Raggy. Um I think that it if you squint, they're they're all the same thing, right? It's all about getting the right context into the window. um you don't

45:15

you may have a million token context window, but you want to pay the co the money cost and latency cost for those million tokens. And then there's a lot of studies that are showing that like more is not better. There there's like one that was called lost in the middle where tokens in the middle of the context were basically ignored stuff at

45:34

the beginning at the end um it would pay attention to. A lot of the uh the metrics out there are kind of like needle in a hay stack, right? like in this million tokens, can we find this one string? And they do pretty good at that. But ask it to reason or make decisions or somehow have attention over

45:51

those million tokens and you see um you see much poorer results. I think uh the term that's coming around now is context rot. The more context you have, the more opportunities you have to confuse the LLM. Yeah, definitely. So like then from Raggy's perspective, retrieval and like

46:10

the connectors and all that stuff is all still principled in this new term context engineering. Like are people still considering like are y'all like kind of converting to like oh raggy we can help you with context engineering or is it still like a rag thing? I think from in my mind they're they're the same thing. It's retrieval augmented generation. Are we going to do a

46:34

retrieval to augment the generation that to ground the generation what in in you know my version of the truth right like my company's version of the truth and whether uh you label it rag or whe whether it's a retrieval tool that the that uh an LLM the model could choose to use or it's an MCP tool we're we still are want to be how the model gets the

47:00

the context it needs to have accurate results. Yeah, that's awesome. Um, and I think Raggy's also like expanded its feature set since, you know, a year ago. Do you

47:12

want to like want to walk through the the audience here? Like what are the like the the coolest things about Raggy? Sure. Um, we had a big push uh I would say six months or so ago to really

47:24

really nail um multimodal. So you can send us audio files, you can send us audio video files, and we'll like normalize that media so that you can even like stream it off of our our platform. So let's say that you have like a two-hour podcast and a retrieval hits something interesting at say an hour and 15 minutes in, when we give you that chunk, we'll also give you a a streaming link that'll let you stream it

47:51

from that p particular time stamp. So you could build like really rich user experiences where searching over thousands and thousands of hours of of audio or audio video. And it was kind of interesting to figure out how to do that in um especially uh with the video side how to sort of marry the transcription to the uh like textual description of

48:17

the video. So we we like transcribe all the audio. We also send the frames of the video for multimodal uh descriptions. We then kind of like group those into 15-second chunks. So again,

48:29

back to the chunking thing, you you chunk video much differently than you'd chunk a PDF um or text in a PDF. So we uh we did a lot of cool work around that. Um uh recently we've been um focused on MCP. Uh so basically our retrieve tools can be used uh as an MCP tool either for

48:50

like desktop style applications like clawed code or cursor or um we're also having some of our customers look into like plugging us in directly as MCP tools for their agents. Um and then another big thing that we've focused on is a uh is a search agent a deep search agent. So we have customers are kind of all over the map and and different degrees of of

49:16

sophistication and we've seen a lot of people um coming from fairly complex domains like uh financial analysis or insurance tables or really dense hard documents and then they want to be able to like ask a hard multiaceted question um of these documents and like a naive rag just kind of breaks down there. Like if you have like three or four questions like implicitly embedded in in what

49:43

you're asking it and uh the answer comes from maybe three or four distinct documents like a single um semantic search even with you know keyword layered on there might grab might recall the right chunks it might not and it might provide you know two out of the four chunks you need to answer that correctly and then as we know like LLMs love to confident. So then it'll

50:10

confidently answer with like a part of the picture. So um we we put some real effort into a uh this deep search model that does a lot of these uh techniques like uh query decomposition. So uh the query will come in. We'll break it down into multiple sub questions. We'll then kind of fan those out to multiple

50:30

parallel retrievalss. We'll see if we answered those sub questions. If we did, we might ask more sub questions.

50:35

different. So basically we keep working the problem until we've got a confidence level. So we have like eval steps. So

50:40

until we get a confidence level that we have the right answer from the right evidence and then provide that along with like the steps we did and the actual like bits of evidence so you can like link directly to like in that PDF at page 210 here's the table we're answering from. When you have different documents, are they like stored in the same store or is like each document has its own let's say

51:08

collection of embeddings to search or is there a way that do most users search across everything? Is that what like is that like the natural way for them to think or is it like I want to search just this document and then maybe I'll search these other ones or Yeah. Um so it's very use case dependent, right? Uh so we have like

51:28

different constructs like we have a the idea of a partition which is like a hard logical separation. So um a lot of our customers are actually like SAS companies themselves and multi-tenant. So they'll like use a partition to like segregate their users and then layered on top of that we have a metadata construct. So you can like kind of tag

51:46

each document with uh whatever your interesting whatever can enable your business rules. So maybe it's a group ID, maybe it's a user ID, maybe it's a topic. So then uh customers will like bas will sometimes hardcode their metadata filters or they'll have like a UI to build it and we're even starting to see people um

52:12

generate uh their filters on the fly. So you can kind of provide an LLM JSON schema of like what the metadata filter parameters they can they can play with are and then kind of dynamically generate that that scoping. That's cool. And for the search agent

52:31

that is like all that is that available through the MCP server like could I use that as a tool? Uh it is not wired up into MCP yet. Um there we're definitely talking about that. Right now we just have retrieval there. There's a little bit of a latency

52:49

mismatch. I think like this is closer to like uh you know deep research where it it can take a couple minutes and I think most like MCP calling is expecting like a tighter loop than that. So we're working through that right now but um we need to sort of like maybe create a very fast version of it that we can add as an

53:09

MCP tool. Yeah, that'd be cool to just, you know, have access to that search agent because like a lot of people don't probably don't want to do the effort to do exactly what you guys did in the search agent, you know. Yeah, it was uh it you know, it's a lot of work to build build an agent that's reliable. Yeah. And figuring out all the various failure modes that models have and when they can

53:31

be sort of like I think too confident maybe is is the word. You are absolutely right. Yeah, you are right. And I I've seen that more and more with some of our users that that we spend quite a bit of time talking to is

53:45

that they, you know, especially with larger organizations, they have all this data and it's comes across as, you know, PDFs and word docs and videos and audio and slack messages or whatever, right? And having the ability to have it and h not just like search it, but have something intelligently search over it,

54:03

I think is a big unlock for a lot of these large organizations. I mean, some of them are building themselves as you'd expect, right? Of course. But I do think having something that they can, you know, integrate much more easily is is a

54:17

big win for some of these companies. I wish they wouldn't build it themselves though because they're going to have to go through this whole discovery of like what you actually need to do to like graduate into having good rag. You know, they'll just build their their own rag team. It's like do you want to build a rag team or do you want to buy one? Yeah. like that's what you kind of

54:36

Yeah. And of course there's use cases for both probably but you know sometimes it's nice to say like hey let's just bring in the experts and they're worried they're worried about insertions like they think it won't scale like inserting all the data and then they worry about retrieval obviously and for some reason they think they can do it. More power to them but

54:54

I'm sure they'll be giving you a call Matt once they can. Yeah, I mean, you know, when you have your 10-page PDF and you you run it through whatever PI PDF setup you have and it's a happy case, it it'll probably work well. Um, we we have people show up with like 3,000 page PDFs and they expect that to get processed in in a reasonable amount of time. We

55:17

actually like we've made a lot of investment just in that stuff like you know sending stuff through like cues breaking up into smaller pieces seeing how fa how far we can push parallelizing that sort of work. So like if you send us a 3000 page PDF we're breaking that up into 3,000 separate jobs and like

55:36

processing one page at a time to try and get it completed in a reasonable amount of time. That wasn't like, you know, it's not like brain surgery, but it's the kind of, you know, uh, like plumbing work that takes actual time and, you know, you got to get all your monitoring right. You got to keep an eye on all your your cues and Yeah, it's real work there. Yeah. Uh, we have some uh, questions

56:02

from the audience. So, let's start with this one. How do you handle data updates, document edits? Do you reindex? We do. We We do reindex. So, um, if you

56:14

send us a new version of the file, we're going to, uh, remove the last one from our indexes, re, uh, reextract it and put it back there. Um, we've talked about like, can we be more intelligent? Just figure out what chunks have changed, maybe have like a hash of like the chunk, but we haven't we haven't

56:33

implemented that yet. Um, so right now it is get rid of the old one, put the new one in. for sure. And here's a good one. In what kind of use cases do you think

56:45

raggy works best when combined with MRA? Um, probably anytime you need uh MRA to answer questions grounded in your data or your files. Yeah. Yeah. I mean, I I've seen people use it

57:01

as basically like a tool for their agent, right? It's like they build a MRA agent. They set up Raggy and they they use a tool to basically allow their agents to search the you know search their their documents kind of like your you know your deep search agent that you've built but more specifically to whatever agent that they're building that fits their organizational needs. So

57:21

that that's where I've seen it. Yeah. Some some of our customers play around with it and really kind of go deep into how do I get this agent to search across my data, right? my organizational data

57:33

and Raggy has a TypeScript SDK also you could just use it and it just works TM yeah just just create some tools you know that is one of the things like you just if it has an SDK just spin up a couple tools give it to the agent and everything should just work should just wire together really nicely that's

57:51

that's nice when uh when there's just like an mpm package you can download and it just works I think that's how I've uh really from the I I wanted something that was like npm install rag so that that folks, you know, can can get a shortcut, get get be able to start shipping their AI features without having to do all the the plumbing work around pulling documents

58:16

apart and sticking them in indexes. Post this. This is the link to the TypeScript docs.

58:23

You know, if you want to use Raggy, have fun. There's a bunch of stuff you could do with it. Um what about reranking? Is that built into Raggy? Like yeah, we have a re-ranker built in. It's

58:37

um it's an optional parameter on our retrieval. So um you know there there's a classic trade-off there of like latency versus quality. So uh probably tack on maybe 500 milliseconds to worst case a second to your call. But if if you want

58:53

to make sure that the chunks you're getting aren't just semantically similar, but actually do address a query, it's an important feature. For sure. For sure. Awesome. All right. Should we take one more question? Shane,

59:05

what do you want? What should we do? Yeah, let's Yeah, sure. And Matt, is there Did you have a demo you wanted to show? I don't know if you

59:11

Yeah, let's do that. Yeah, I can do a demo. Um, so hold on one second. Let me figure out she's screen

59:17

sharing in here. Yeah, you go pull that up. I think one of the challenges with a service like this is is always figuring out like you need to provide the right dials to for for your users to turn and I feel like that's something that you I know like I I've talked to the team and you think a lot about is like enough flexibility but also enough hands-off where you don't

59:35

have to worry and it's a constantly I imagine maybe it differs for certain customers but ultimately you you want to provide like a very seamless experience but enough customization where you can turn on re-ranking if you need it and do those extra things that you may not uh as a user, you know, you you need the

59:51

flexibility, but also like it's a challenge. That's kind of like one of our design goals from the beginning was um expose enough knobs, but not too many knobs and make it work out of the box so you don't have to sort of go look under the hood unless unless you know what you're doing and you want to. Um so, uh this this is uh an app that we have called base chat.

1:00:17

Um, it's our open- source chat application that uh we kind of use as a uh a reference app. Like whenever we build a new feature, we tend to build it into base chat just to make sure the the DX is is there. Um, so we have this deep search option here. So there's like

1:00:36

different types. We can do different types of searches here. So we're going to do deep search. We're going to do it

1:00:41

on fast just for in the interest of time. Um and then this is uh a a benchmark called called Finance Bench. I guess I can't go to the tab, but Finance Bench is like a pretty rigorous um uh financial data set and about 150 questions related to it. And the uh the

1:01:00

document store is I think 350 10k filings. So it's like a pretty non-trivial rag problem. So I'll just run this one sample from there. So, you

1:01:14

kind of uh see the the agent thinking here. It's searching um it's going to look for the American Express 2022 annual report. Look for the largest liability balance sheet. Um and so right

1:01:28

now it is uh under and we'll see this in a second, but it it's doing two or three different variations of that query. Um and it is it just completed it. Uh, so it ended up asking these questions that it searched for and then it found these results. Um, so I'm not going to like dive too much into here, but you can

1:01:52

this came from one of the 10Ks. Um, and you can see it decided it got enough information to answer. Uh here's the answer that it's going to be using and then it's handing off to like a citation sub agent right now to to take that answer and um put in the citations so that they're clickable. So yeah, there we go. That's uh our search agent. If you put it on harder modes, it

1:02:20

will uh like do stuff like forcing a planning step in the beginning. It will decompose more questions of your query. it will do more aggressive evaluations of substeps like on our on this fast mode. It kind of just evaluates the final answer. It allows it to search right from beginning. But like as you

1:02:38

adjust our effort level, we're not just like using heavier models. We're also um providing different business rules for what the agent can do at any given step. And then also uh you know just budget stuff like you can use this many tokens you can go this many turns and this is is this available then in

1:03:02

your SDK or through your API? It is it is so uh it it has um like a synchronous mode and a streaming mode. Um this is using the streaming mode. Um

1:03:15

but if you go check out uh our SDK, it's under um the responses uh section of our SDK and then the model name is called deep search and we kind of followed the uh the OpenAI responses um API uh for like the shape the schema of these responses. So if you have a a responses compatible client, we should be a drop

1:03:40

in. If we're not, hit me up and I can help figure it out. But so far it's worked well.

1:03:46

That's another pattern I keep seeing out in the wild is building it into the responses API. So like the friction for the user is negligible, you know, to take advantage of all these powers. So like sick.

1:03:59

Yeah. Yeah. I you know when we got to that part, you know, do we want to create our own protocol for these streaming and these responses? And even if we did, even if we did a good job, are we gonna like the lessons they've

1:04:12

learned up until now that, you know, we can just kind of piggy back on that rather than try and reinvent a wheel, get it wrong twice and Yeah. Yeah. This is awesome.

1:04:24

Yeah. Makes me think of all the things I could uh hook hook this up to in in MRA and uh build an agent that has a basically a sub agent that can do do deep research on sets of documents. So, that's pretty cool.

1:04:36

Yeah. Um that's definitely like I I I see like that a a search agent as opposed to a research agent like a research agent it its output is like kind of a report. The output of this is more or less a fact with evidence to support it. Yeah. So I think it composes well into aic

1:04:56

workflows where a report might be too big and heavyweight. Yeah. And you know, if they're using the responses, if y'all are using the responses API, you could actually wire this up directly to our agent primitive.

1:05:10

And so like it would work just the same. So that's really cool. That's the nice thing about if it does use the responses is just you could use this as the underlying model to an agent pretty easily.

1:05:22

Yeah. And then and then use it as a sub agent connected with other agents or in workflows. And you know, it's very composable. So it kind of slots in very nicely. Yeah. And I guess um this is just like

1:05:34

one UX. You know, obviously you could put any any UX on top of it you'd want. Um but we found that like people kind of really like to be able to poke in and see what the agent did at every step.

1:05:45

You know, the chat GPT where it's very like ephemeral like what it's working on. You can kind of click on it and see. But like like showing the whole history here, we found that like it it increased confidence for users that we're like we're not hallucinating. your documents say this. This is where we found it. Yeah. Just it's like show me proof of

1:06:05

the work that you did so I know I I can valid I have the validation if I want it. Exactly. And for I guess my last question here on fin on this finance the documents for finance chat or bench um you could either upload these documents programmatically or through the raggy dashboard. Is that correct? Yeah. So, um, we have our data

1:06:30

connectors. So, when usually if I'm doing like one of these rag benchmarks or or whatever, um, I'll just go and I'll drop them all in a folder in G Drive and then I'll just connect that G drive and go get a coffee and come back in a little bit and start running it. I mean, it it it's actually kind of like a

1:06:51

non-trivial data set. there's like uh 350 100page PDFs. So yeah, it it takes a little you know tens of minutes to sync that completely and you could write some code to do it directly with the SDK or make REST calls to us but sometimes it's easier to just throw it in a folder and do a quick authorization flow. Yeah, I agree with that. Be way easier.

1:07:16

Way easier. Yeah. And I and one question related to that. So Raggy takes care of storing it

1:07:21

all. Yes. Right. You sync it, you store it, you do all the the chunking necessary,

1:07:26

which which is really cool. I mean, I I think that's that's interesting when you when people think of, okay, we I have all these data, all this data in all these different systems. I can just connect it, drop it in a folder in Google Drive or connect, you know, whatever other external systems through the connectors and then the data just gets sucked in

1:07:44

and now I have an API that I can search against and that's, you know, that's pretty cool. Yeah. So much glue code involved with doing it. Like I would just not do it

1:07:55

personally. Like we have a lot of glue code. Yeah. Let let Raggie handle the glue, you know, just push it together.

1:08:06

Awesome. Uh well, Matt, we really appreciate you uh joining the show and answering some questions and hanging out with us for a bit. Uh, any anything else that you wanted to talk about before we move on to the next AI news segment? No, thank you for having me. This is

1:08:24

really fun. Yeah, let us know when you come to the West Coast for a Raggy event so we can uh have beers properly. Awesome. For sure. Yeah. Yeah. Looking forward to hopefully seeing in person at some point. But if you do any new big things you have that

1:08:38

come out at Raggy, feel free to come back on the show and tell us about it. We'll do. Awesome. Thanks, guys. See

1:08:44

you, Matt. See you, Matt. All right. I remember um talking with Muhammad at

1:08:51

one of the raggy events we did like during YC and it's just cool to see and even base chat like Bob was talking to us back then about this what we're planning on doing. And it's awesome to see it all is now reality. Yeah. Like the integrations, the multimodal thing. I remember Muhammad

1:09:09

was like I don't know if he was stressed but he was drinking a beer. He's like, "God, we got to think about multimodal images and audio." And it's like now it's reality. So that's dope. Yeah. Yeah. It it is uh it's just one of

1:09:21

those things you you see at a point in points in time, you know, over the last what six plus months probably, you know, I would say maybe eight months since we had our first raggy event with them. Yeah. Yeah.

1:09:33

Been hustling. Yeah. They're making moves. All right.

1:09:39

And to catch everyone up, if you are tuning in, you know this is AI agents hour with Shane and Obby. We talked with Ismile from Super Agent. We just got done talking to Matt from Raggy. And now we're moving on to talk about some AI news and probably a little drama. I'm

1:09:57

sure we will weave into this, but yeah, let's let's talk some news. So, uh, first, do you want to share this context engineering workshop? Yep. So,

1:10:11

let me get to my thing here. So, if anyone's interested, this Friday we are doing a joint workshop with uh Zero Entropy and Mezero. Zero Entropy is um from our YC batch. They do a lot of cool things around rag. Um especially now well I guess it would be called context

1:10:40

engineering. Um they have different uh they have a reranker model. It's very nice. Mem zero. The CEO Taranjet is an

1:10:47

absolute legend. So I just have to say that anytime I uh talk about mem. But we're just going to be talking about context engineering as a whole. We'll be from our perspective. We'll talk about

1:10:58

our memory implementation and everything like that. But if you all are free, come join. There's a hell of people. Um, yeah, that's that.

1:11:10

And kind of talk still talking a little bit about MRA for a second. We did have this that was shared if I can actually find it. Uh, which was just awesome. So,

1:11:24

a community member just built this and it's like extremely accurate from, you know, me looking it over and I thought it'd be worthwhile to share. So, just to highlight Eugene, thanks for putting this together. And if you're looking for just a highle view of kind of how MRA works under the hood, this is a pretty good conceptual diagram of how it all of

1:11:50

a lot of the stuff that happens underneath. Yeah, I we were really impressed when we saw it because it's really it's really it's just 100% correct actually. Yeah, it's just accurate, you know. It's just and it it it's made to feel like simple. So, if you actually read this,

1:12:06

you'll kind of get an idea of like how things work together and how the different primitives kind of stack are composable. Yeah. And Yeah. So, yeah, thanks to Eugene for throwing that

1:12:18

together. Yeah. I believe he works at Canva, so That's tight.

1:12:26

And also Vinnie put together this really awesome, you know, it's an open source recipe agent software as a service. So basically it's a whole like software as a service platform that's a recipe agent. So it's a really good example. You know, has Stripe, uses S3, has agent memory, it's

1:12:46

full stack. You know, there's a whole 22minute video that shows all about how it works. It's all open source and basically it's an entire platform agent, right? That's just a recipe recipe agent. So, if you're looking for like a full stack application and seeing like

1:13:02

how you might wire up something to deploy to users, this is a pretty cool example. Yeah, they have a framework called Wasp. Um, and I believe they have this open SAS template that they put on Product Hunt which went bananas. So, yeah,

1:13:19

you know, great for them, great for us, great for everybody. So, it was really cool. Yeah. So, check that out. And with that, let's talk let's talk a

1:13:32

little bit about uh is Lovable Dying? Is Love I'll share my screen for this one. Is Lovable Dying? One second.

1:13:42

So, this happened, I believe, over the weekend, and it's like perfect bait. It's like the best bait that there might be. Um, what? So, yeah. So, this is like web traffic.

1:13:53

This came from crust data. They're from YC 24 or summer 24. Um, so they kind of compiled this this uh this image here. And

1:14:08

having lived through this era on the X-axis, like especially us this year, we kind of see what was happening. Like definitely when we when we saw lovable, it was like the all the rage and maybe, you know, maybe it's going down or things normalize. The other ones too, like I would say Replet kind of came out of nowhere. Uh

1:14:32

Vzero is always a thing, but very steady. Uh, bolt came out of, you know, freaking nowhere and then maybe it's becoming more steady. And I don't even know what this last one is. Um, oh, emergent,

1:14:44

which I once again I still don't know what that is, but what I think what struck this was this other this one was also bait I guess. But these are some YC companies that are like lovable for X. So like you have lovable for non tech, lovable for internal projects. And then what really got people like what is funny and what

1:15:10

really got people going was what is lovable for software engineers? Isn't that just lovable? And so as you can see like the internet blew up and you know so then this led to once again this and I guess you know it's just bait for everyone to have some opinion or think about it. But I guess what's everyone else's opinion?

1:15:34

So I I think that original diagram was mostly like search traffic, I'm guessing, right? Like what is it? It it must be search traffic. So I could see search traffic, you know, spiking

1:15:46

because it was such a hot moment, right? And they kind of had this incredible just liftoff and then I could see it coming down. there was some kind of like followup and maybe you can find the the the follow-up that just shows the chart that's going up and I think that's the revenue but then I also heard someone

1:16:04

else say is that your churn because it was like joking you know it's like because these tools I do think that you know there is inevitably some going to be some churn with these new tools right there it doesn't solve everything it it solves a lot of things and you can get to prototype and for certain things because I I know a few people personally

1:16:23

that in this case have used lovable but others that have used replet and you can actually get things into production as well. It just kind of depends on what you're building. So, you know, we we've used Lovable in the past. We obviously

1:16:34

use Replet a ton, but I thought this was a good uh Yeah. So, one says like the web traffic data is incorrect and then looking at users who decide to pay. Yeah.

1:16:45

So, you know, and here is the interest over time. So I mean I guess it's it's like web traffic is not necessarily the way to look at this. I agree. But then like

1:17:00

vibe coding as a concept and all the products are I guess getting less searches. So that's just interesting. It's all to bait us though. Really it is

1:17:11

just to bait us into thinking like you know like I don't know. Are you baited at all? I'm baited to be like, "Oh, okay. Vibe coding is X or vibe coding is Y." We also have this vibe engineering

1:17:24

we talked at at the top of the podcast today. Like, it's just I don't know. I I just feel very It feels very icky. People need uh need things to talk about

1:17:36

and in this industry, they're going to find, you know, they find contentious topics and get get people on either side to argue. And that's, you know, part of the the cringe, but also maybe part of the fun for some people. That's why they they pay attention to things on all the drama on X or wherever else, you know,

1:17:54

other other social channels that that it lives on. But I do think, you know, the vibe coding platforms had an extreme popularity boost and then I think there's probably some sense of like normalization of like, okay, what can I actually accomplish? And there are real things you can accomplish.

1:18:11

Totally. And yeah, I mean, I think just we should just leave it at that. these these platforms are hugely popular. They're still like like that's not going away. Are I could see a case to be made where

1:18:24

maybe they their growth slows a little bit because there's only so many people that want to build apps at this moment in time. But you know who benefits the most from this slander? Think about it like that. Who benefits the most? Their logo is a butthole. So

1:18:41

those are the people who who who like benefit the most from this type of stuff. So they have their own CLIs, SDKs. They don't want these products to have popularity. You know, they want you to use their agents to

1:18:56

build agents. They want you to use their CLIs. They want you to use their tools. Yeah. I mean, that's that's probably

1:19:02

that's probably very likely the case, you know, but but again, they benefit on both sides. Even if you don't use them, you're still using their models or maybe using their models depending on on who you're using. So it's like it's like they don't want them to win, but also if they win, they still win. So yeah, but they want they want them to win first. You know,

1:19:20

they'd rather win first, but but at least it you know they want butthole dominance. They're guaranteed to probably win second for sure. Yeah. Uh all right. So

1:19:32

this has been kind of a hot topic and I don't think we'll have time to read the entire thing. But this idea and I've seen some posts about it after the fact, but it's agents 2.0 from shallow loops to deep agents.

1:19:48

So it's basically arguing that you know it's kind of been a shift a agents the whole idea like building agents has been a shift from kind of simple agents that call tools in like a shallow loop right to this idea or more of an architecture around deep agent where there's an explicit planning step there's maybe

1:20:09

some delegation to sub aents you have persistent memory and then obviously you know context engineering so that's kind of the tenants of this deep agent. So the idea is it's a little bit more of a complex architectural diagram around agents. I mean I don't think different like I don't know but but I do think

1:20:31

it's there is some distinctions between kind of like the way that a lot of people think about agents in this. What are your thoughts? I mean we've already been we've been agent 2.0 when there was an agent 1.0. U

1:20:44

I don't think I ever thought of agents other than this 2.0 no diagram. Uh we have a primitive for it too called networks. And even if you weren't using a network, you could you essentially have this going on at least in MRA. Um

1:20:57

so yeah, nothing new to me, but I understand where they're coming from. Especially if Simon Wilson said that agents are just tools, loop, and a goal. That does make it seem very shallow. So building on that definition, then I guess we have this other one.

1:21:13

Yeah. But but I think it's you know it can be confusing to people because if you look at you know kind of how you know early like frameworks right you had lang chain which was just like a chain of LLM prompts like that was the whole idea right it was just a that's why it's called a chain right like string a couple together and then you had people like uh crew AI and then open

1:21:37

AI had swarms or whatever right which is just and there was some others that are just like no this is just a bunch of agents that talk to each other and good luck and then obviously like it coalesed to the middle right which is and we're going to talk about this you the idea of okay maybe some things need to be predefined workflows so you had you know lang graph you had master workflows crew

1:21:57

and announced I think they have flows or whatever at one point so it's this idea of okay you need some more control and now as the models have continued to get better it's like well okay you you still need some level of control but maybe you can delegate certain things to sub agents but I do think there are also some frameworks or SDKs that really were

1:22:16

just LLM with tool calling and that was the their version of agents right and so if you if you're coming from that maybe this level of sophistication is new or is like you know a little bit more technically challenging we've kind of been working on it for quite a while so it doesn't seem new to us but I do think to to many people if you you think

1:22:35

about this and it was goes back to you know and we talked about this I think it was last week or two weeks ago you know kind of Swix's talk at AI engineer Paris, right? But the the um operating system, the LLM operating system, which we said, oh, that's basically looks a lot like a MRA system diagram because it

1:22:52

there are a lot of components to building more complex agents that you may need, right? So, yeah, I'm an agent 2.0 supporter, I suppose.

1:23:04

Um, but this does conflict with Cognition's anti- uh multi- aent system, right? So, you know, at at some point it gets confusing, y'all. Like, you know, one week we're anti multi- aents, another week we're now into deep agents, which is also a stupid name, but you know, whatever. I will get on that bandwagon. Deep

1:23:27

agents, a terrible name. Yeah, because well, it's the it's the antithesis of shallow agents. Like, I mean, that was a dumb name, too, to be honest. But yeah, I mean, so concept I I'm be I'm

1:23:38

behind the concept. the naming. Come on. We need better names. Yeah, dude. We should like before they

1:23:43

name stuff, we should all just get in as like a circle and like, you know, brainstorm the names before they become I mean, some of these decisions we're going to be stuck with for months. Yeah. So, let's let's think about these things. And then like our customers are like, "Hey, do you guys support deep agents?" And we're like, "Man, what the is a

1:24:00

deep agent?" And like like, "Yes, here's how you can do it very easily in a few lines of code." And they're like, "Oh, okay."

1:24:07

Yeah. Do you support ambient agents, deep agents, shallow agents? Yes. That's just the answer is yes. I don't know. But yeah, I mean I I do think that some

1:24:19

of these terms have a tendency to stick and when people don't um when they're not, you know, you can't stay on top of everything. So you like pick up on some of these terms and then it makes sense. And so now you you're going to use that term. But unfortunately then, you know, whoever has the sometimes the loudest

1:24:37

voice or the an interesting idea at an interesting point in time can, you know, come up with a term that we're stuck with for a while. Yeah. What do you think? You think Deep Agents is a whack name? Uh, put it in the chat if you do. If you

1:24:55

don't, you know, have a good answer for why it's not. Yeah. Yeah, and I think this is probably a good transition because we were just talking about how we got from swarms of, you know, from chains to swarms to workflows. There's been a ton of workflows related just releases and announcements in the last week.

1:25:16

Obviously, the big one, we talked about this last week, was OpenAI's agent builder, which is basically just a drag and drop workflow tool, right? Yep. And we also talked about 11 Labs last week. I think it just announced just before our show basically just before open a AI announced their agent builder

1:25:34

is 11 Labs has what their own workflow builder more towards voice agents of course but you know they have they they announced that and it's kind of like a drag and drop builder then we had this um you know our our friends over at Verscell did uh announce they have AI elements for visual workflow builders. So building agents visually is a cute

1:26:02

idea but only if you can eject the code. I agree with that. Use multiple models and agents and remain platform agnostic.

1:26:09

So you know AI SDK or Burcell is kind of in the mix at least they they're shipping tools to make it easier to build workflows. So that's definitely a step in that direction. We then had another uh this is and I don't even like to show this because it's a tease. They didn't actually announce anything yet, but I thought it was kind of similar. So,

1:26:36

Firecrawl is announcing an agent builder, a visual N8N style workflow builder for AI agents. So, connecting Firecrawl, dev, LLMs, logic nodes, and MCPS. So, you have a v visual builder. And then to top it all off and you know probably my favorite of of the group

1:26:57

um I don't have the link here actually. Do you have the link to Ked's? Give me one second. might. So to tee it up, someone in the MRA

1:27:11

community, and of course it there it's still probably not production ready, but it's not perfect, but someone in the MRA community basically just built a visual builder for MRA, which is pretty cool. Yeah. Right here. And they have like the UI. So you can like drop an agent and

1:27:31

then you can connect some stuff, you know, pretty cool. Takes all of our primitives. Um but yeah, so then people are asking us when agent builder for MRA and it's like yeah, we'll do one probably.

1:27:53

Yeah. And a couple comments here that I think make a lot of sense. These are more so building workflows and not actual agents. And then workflows seem something new today,

1:28:05

which is funny because yeah, workflows have been around for many, many years, right? And we've had gripe with this dude. When we got into YC, we were like, people were like, "Oh, dude, I'm AGI pilled. Agents will do everything. The LLM is my workflow."

1:28:22

yada yada yada. Okay, which is fine. You know, that was a belief. We were like, nah, workflows are needed because you

1:28:29

can't trust this 100%. Many months go by. Now it's like a huge discourse now on Twitter about workflows versus agents. Once again, uh we've been

1:28:40

talking about this for a while and it still seems we cannot agree still after all many months, we cannot agree that workflows have their purpose, agents have their purpose. But now it's even more confusing because an agent builder is building a workflow where LLM may be a step, right? And now it's just more confusing to people, I think, you know, sucks. Yeah. Yeah. I mean, we have a So, Alex

1:29:07

on the Master team put together a workflows versus agents video, which is pretty cool on our YouTube channel. So, if you go to our YouTube channel, if you want to get our take or at least Alex's take, which I agree with. Yeah. pretty

1:29:21

whole wholeheartedly. Yeah, there's a there's a place for both, right? It's like where do you want do you want more control? If you can make something a

1:29:26

workflow, make something a workflow. If you know it deterministically, then yes, make it a workflow and call out just to an LLM where you need to. And there are other times where you want to have a workflow that actually calls out to an agent to go run for a while and then come back. I mean, these things should be composable. We shouldn't be saying they're the same thing because I don't

1:29:47

think they are. I think a workflow is something you're deterministically choosing and uh an agent is something where you know that there's going to be some level of of autonomy there right in how the choices are made but they should be able to be composed together and I think it's that's sometimes uh with some of these workflow builders it's just a

1:30:06

matter of like what even is an agent at this point because it could be a workflow if you say agent you might mean like a v like a visual drag and drop with one LLM call that could be an agent or it could mean some multi- aent system that has deep research and pull goes into you know who knows what APIs

1:30:23

naming things is hard but despite it being hard we have up tremendously in naming things you know like you got the brightest minds in the earth doing right now and we still can't name properly that just makes me laugh yeah all right so let's continuing on so some people on the team shared this and I thought it was really interesting.

1:30:53

So this is from Alex Hughes and uh I'll read the the context here for those that might be just listening. So there is a claim that Stanford just made fine-tuning irrelevant with a single paper. That's a bold claim, but it's called agentic context engineering. It

1:31:12

proves you can make models smarter without touching a single weight. So instead of retraining ACE, you know, agentic context engineering evolves the context itself. So the model writes, reflects, and rewrites its own prompt over and over until it becomes a self-improving system. Think of it like the model keeping a living notebook.

1:31:31

Every failure becomes a lesson. Every success becomes a rule and has really good results. And there's some numbers here. And so everyone's obsessed with short and clean prompts. But AC flips that. It

1:31:43

builds dense, evolving playbooks that compound over time and never forget. Because LLMs don't crave simplicity. They crave context density. If this scales, the next generation of AI won't be fine-tuned. It'll be self-tuned. So,

1:31:57

we're entering the era of living prompts. Then there's, you know, a a link down here to the research report and a whole a whole thread on it. But that's the high level, right? is that it's the idea of rather than the rather

1:32:12

than needing to fine-tune a model with data, can you change the your context over time to then get better results, right? So, it's basically it's learning and I don't think this is like revolution. I mean, the paper itself might be, but the concepts again, this is something that if you've been using memory in your agents, you're already your agent should be getting better, right? It might remember your name. It might remember other things. This just

1:32:35

kind of takes it to a different level or a different extreme maybe. Yeah. Like the memory strategies that are employed today like long-term mem working mem semantic recall, they're not necessarily like a evolving process, right? Like a human or like you as a user needs to make sure that over time

1:32:54

like these things are good, you have like the right messages in the window etc. But having like a workbook or you know we call it something else internally because we are building something similar. So like this white paper kind of like acknowledged our thinking which is great whenever that happens. Um but memory should be more

1:33:13

like a human and we are all our memories evolve over time. Uh we things that we thought in our 20s are different in our 30s. Times change. Everything changes.

1:33:25

Um, so it's not necessarily I didn't have to fine-tune my brain. I just had to evolve or observe and realize what are what's factual, what's not. And I totally agree with the paper. The problem is in the comments there, like

1:33:37

it it doesn't fit the mental model of what context engineering means today, which is fine. It's just people don't understand it. Maybe it's a naming problem. Agentic context engineering. I don't know. But it makes it seem like

1:33:51

it's the same strategies of that people employ today. might be something different actually altogether. Yeah. Well, I mean in this case it's almost like giving the agent some

1:34:02

flexibility to control its own context, right? You're not just doing the context engineering yourself. There's some element of the agent keeping memories, important concepts, condensing information over time, but still, and this is where I think it it really makes sense, is that your agent could still have access to all the memories ever,

1:34:21

right? It has access to the data source. So it has context. It has tools to maybe go out and get more context from the the

1:34:28

actual maybe doesn't have the full message in its memory. It has a part of it and it can go out and pull that information in. So it can essentially you know modify its context for the prompt that comes in or over time to keep the most important things. So it's

1:34:42

essentially learning but it's filling like the right amount of context depending on the model for the right amount of time. And I think there's going to be a lot of strategies and eventually there'll be tools to help manage this. you know, master will be one of them. There'll be other tools out

1:34:55

there that that help you kind of manage this context. But this does kind of push back because some people, especially if you're an engineer, you want to see the prompt. You want to know everything about the prompt. But as these contexts get larger, you're it's going to be hard

1:35:08

to know like everything that actually goes into the prompt because ultimately your context is almost compiled between different things and then put together and you have to under like if you're using LLMs to generate the window then you need to understand why it generated that way like it becomes a lot harder but it's probably the right thing to do

1:35:26

like for example in the human brain like there are it's not a single agent like if if we're try to com make a comparison here Your brain is not single agent. It's like multiple neurons firing at all times. And some maybe are using large language models. Some are using SLMs. Like it's not the main thread that is updating its observations or its

1:35:46

context. It's maybe sub aents or subtools. Like for example, when you like observe something, you're not waiting for your tool call in your head to be like, "Oh snap, it's raining right now." Like that's already happening asynchronously. And then you get fed an event because right now it just started

1:36:02

raining. So that just happened to me right now. Um that's the type of that's going to happen. That's what I

1:36:08

was saying. SLMs are probably the future and we'll probably see a lot more in that in this space. We also will contribute to this era or this subject area. So stay tuned and see what our

1:36:20

next memory implementation is. And uh Alatar, I don't know if I'm pronouncing that right, says seems we're reusing all technologies as new ones. I would agree with that. a lot of things

1:36:33

that are what's old is new. Again, when we're talking about workflows, we're talking about, you know, some of this context engineering. And also, you know, naming is underappreciated. Naming's hard. Naming is hard.

1:36:46

That's why I like I thought memory was a good name for this but maybe not. I don't know. Yeah. As human as possible is the naming is what naming should be.

1:36:58

Agreed. All right. Uh so just a few more things. We're about to

1:37:03

wrap up here. Uh I don't know if this is actually launched yet, but I wanted to share this from last week because I think you know it's it's always the Abby and I basically talk about the bar test which is what can we ship when we're at a bar and uh so now Anthropic is preparing cloud code to be released on the mobile

1:37:26

app. So you can connect cloud app to GitHub and run coding prompts on the go. So I don't know when this is launched yet. Maybe it's already launched. I haven't heard about it actually launching. If this if it's true, I can

1:37:38

tell you Obby and I will test this and we'll report back. The bar test is the best test. Yeah, it's what what can we ship over a drink at the bar? We should do the bar Olympics.

1:37:52

It's it's like Dex is having like the the Vibe Code Olympics. It's like the bar Olympics. Like you while you're drinking on your phone, what can you ship that's valuable? Yeah. Like I'll use cursor, you use cloud code, someone uses codeex. We all

1:38:07

have a pint and a shot. And there you go. If anyone wants to participate, you know where to find us. We will set it up. We're right here. We're right there in

1:38:18

the wait there. The dog patch. So you can dog patch. come to the saloon and we'll

1:38:24

we'll have the Olympics. Yeah. If you ever want to possibly run into us when I'm in SF, if you go to the dog patch saloon, you might see Abby and me there. We just 99.9% chance you'll see us there.

1:38:36

We have doxed oursel right here on the, you know, live stream. Uh what caught uh Alert Allar's attention is that we use brain analogies to build your systems. I mean, we try to when it makes sense.

1:38:49

Yeah. the personification of of uh human stuff. Shots and PRs. Shots and Ruben, dude, I think you're like my kindred spirit, dude.

1:39:01

Great questions today. All right, let's end with this one, which is just kind of fun. So, AI music has kind of hit the scene. I'm a musician. You can see I'm in a good place today to to you know, so I I will

1:39:17

say like ju this is just a true story. So, quick quick tangent. I remember like when all the image models came out and I could build really cool images that I could never conceivably design, I was like, "This is great. There's no negative here. This is" And then I talked to some like graphic artists and

1:39:32

they hated it, right? Of course. They're like completely opposed to anything because it's a new tool that maybe makes their skills slightly less valuable, right? Like that's the idea. And then with AI music, I kind of felt it a

1:39:44

little bit because I was listening to a song and I sent it to some friends and then one of my friends is like, "Yeah, dude, that's AI generated." I was like, "No way." Like it was a banger. That thing was not AI generated. And it was like it was in my genre. Like I'm kind

1:39:57

of like like heavier rock music. It was right in that genre. And then I kind of did some research, couldn't find anything about the band. And then the singer wasn't always the same between their songs. I'm like, "Yep, that is AI generated." But it was pretty good. And

1:40:10

so as a musician I'm like maybe I am irrelevant now you know so I do think that uh you know these tools can be you know can be used and I do think there's still a skill to using these tools to generate something creative but it is just an interesting time to be alive that yes like you can use tools someone with much less musical skill but maybe a little

1:40:32

tenacity and knowledge of how tools work can create something that's pretty cool. So, did do you want to share? Yeah. So, I found so over the weekend and it might not it's probably been out

1:40:45

for a lot longer, but I found this YouTube channel and I'll share it right now. It's called Almost Real. And they have all these different like just songs that they've made turning like rap music or like pop music into different genres. Um, a lot of it's turning it I think what they found is they found some type

1:41:06

of like strategy to make songs into Mottown soul because a lot of their like songs are like that and it's and they're doubling down as you can see. They're turning everything into Mottown soul and the song that kind of got me into it was I found like I just saw this randomly so I'll play it. Hey, it's

1:41:31

all right. So, I'll play a little bit of this. And should I turn it down or up?

1:41:37

Uh, it could be down just a little bit from where it was at, but cool. Oh, dude, I should get YouTube Red or whatever. Um, okay. Let's skip this.

1:41:57

I just want to chill and twist. Now we're going to get a YouTube copyright. Worst thing about hope you can come up with the answers like when I was in high What was that? What was the name of this album?

1:42:44

Um, Almost Real. Almost Real. Yeah, I'll post it in our chat here for anyone who wants to listen. I listen to a bunch of songs there and like I pretty much like I

1:42:54

think like this week I'll probably just be listening to all these like uh these AI generated stuff because it's so good and like I mean I'm a big 50 Cent fan so I think it'll resonate with you if you like the original artist and then you see it. Um but yeah, go take a look if you're interested. There's so many like

1:43:12

56 videos Snoop Dogg like from Snoop Dogg Cuddy to you name it there. They got it. Michael Jackson's on there, too. But they are doing all Mottown Soul. So, if you also like that, which I do, it's

1:43:26

definitely a banger. Um, new official outro song. Yeah, potentially. I mean, then then the channel's definitely getting taken down. Yeah.

1:43:37

You got to be careful. Like I I've learned this the more we live stream. When we play videos, even if it's like a promo video for something, like YouTube flags anything that has any copyright, and then sometimes they'll pull down our our stuff unless I contest it. Oh, really? Do you think they're going to contest this? I don't know. We're gonna find out. If

1:43:54

if you can't see this video in a couple days, maybe they did. I'm sure. Well, there's many levels of Yeah. copyright away from us, right? Because then it's them and then Yeah. Well, and yeah, it is a

1:44:06

copyright's a tricky thing, but you know, for us, it's not like we're trying to we're just trying to highlight the stuff, you know, but but it is what it is. Those are the rules. We just we try to play by them. Yeah. We're trying to with the friends. Yeah. All right, everyone. That's all

1:44:22

that we have today. Thank you for watching or listening to AI Agents Hour. As always, we do this on Mondays, usually around noon Pacific time. If you have not got your copy of Principles of

1:44:36

Building AI agents, you can go to master.ai/book and get it for yourself. You should uh follow Abby and Dude, I it's hard to keep a straight face while you're doing that. Follow Abby and myself on X if you're

1:44:56

not already. Follow MRA on YouTube. If you're listening to this in the podcast version on Spotify or Apple Podcasts, please go give us a five star rating. It helps feed our ego, but also just helps other people find the show. And that's

1:45:11

it for today. Yeah. Thanks. Peace.