Back to all episodes

Agent Networks deep dive, Tool compatibility, Dealing with downtime

June 2, 2025

Today, Abhi covers some AI news and then goes into the architecture of Agent Networks with Tony from the Mastra team. We end the episode with a Mastra pairing session where we go further into Agent Networks.

Guests in this episode

Tony Kovanen

Tony Kovanen

Taofeeq Oluderu

Taofeeq Oluderu

Marvin Frachet

Marvin Frachet

Watch on

YouTube Spotify

Episode Transcript

2:04

welcome everyone to AI Agents Hour i'm Abby um I'm from Mustra and uh yeah thanks for joining us today it's going to be a pretty chill stream um first we're going to play uh where in the world is Obby today and I am in a small town in Belgium called Meccalin i don't know if you've ever been there plus one in the chat if you have this will be a chill stream i don't

2:32

even know if the Wi-Fi is that good so um but thanks for being here everyone uh we have a nice show as always um we are doing some AI news then we're going to talk about agent networks i'm going to bring Tony on and then we're going to just do a pairing sesh with the team should be pretty chill um yeah let's get into it oh also some

2:57

some quick news I guess or pre-news um we have this AI agents book that we wrote the principles of uh building AI agents and a V2 is shipping for that which is great and I'd like to give out some versions of V1 to people watching now there are some rules here and I'm going to have to state them um you know over and over again like we

3:27

do in live streams but let me tell you the rules and then you yourself could win a book and I'll I'll ship it to you and I guess I have to repeat that multiple times during the stream but I'll we'll ship it to you um and only gonna do like three today so this is how you get it and I'm gonna have to say it multiple times again but

3:49

if you can tweet about the book and tag me and then if we select you um I'll send you a D we'll send you a DM get your address here's the problem though we can only ship to a couple countries so like US of course Canada Sweden UK Germany Japan Belgium where I'm at France Australia Spain Italy Netherlands Poland or

4:20

Switzerland otherwise you just got to settle for a digital copy so um yeah just send up you know just post a tweet about the book and how much you want it or something and we'll we'll send you a copy if you live live in one of those countries and um all right it's weird not having a co-host for this thankfully I have some friends on the way but uh yeah let's get started into

4:44

the news first thing I wanted to talk about today is kind of AI related kind of not um is a tweet from GMO so let's go through this um one of the painful lessons in infrastructure engineering is everything misbehaves your own client code is your adversary you need to bake in load shedding circuit breaking rate limiting observability check summing on every implement layer of the stack now

5:12

this is in response to T3 chat which had a um you know an outage and honestly like when these things happen you got to hug people who are going through this you know people in the chat if you have ever been through firefights especially when um it's a third party dependency um it just sucks you know and you know there's a very few times you feel truly helpless in your job right and this is

5:39

one of those times where you try to do something that you really can't control but it's no one's fault you know it's just the way these things go so T3 chat if you guys don't know is a very popular um like I guess a chat you know AI chat um it can it's really fast it's from Theo um you can switch models it's like

6:01

really cool and uh they've been trying to work towards this like next version um and I I resonate a lot with this tweet because I've been here you know and I think a lot of us have um and I really appreciate him take you know posting this accountability um it just keeps it transparent these are the things these type of posts here um are

6:27

what um people talk about internally you know like if we if we have like you know for example we've had outages at on cloud already from actually I'll tell a story about that how's that um uh but I can't see this the stream so let me just move okay great I can do this and then I can see you guys here okay great Um

6:53

okay so I'll tell you a story about this when we first started MSRA back in October we got into YC at like January let's say January 6th or something we had our cloud product done in like you know 3 weeks or so and it wasn't good but it was like a initial version at the time YC was giving out these like credits uh for Google and so we got our

7:18

Google credits actually we haven't got those credits yet by the way so Google give us those credits but we met with our account manager and uh we were like "Hey like the default settings in Google is only six VMs and we're going to need way more than that." So they were like "Oh no you can just uh do a

7:37

quota increase." Okay you can do a quota increase whenever you want like all right sure so our bad is we procrastinated for a long time and then one day people started using us right and we blew through six VMs and then no containers could get scheduled in our cluster so we go to the quota and we add the quota hey we need 100 VMs right the

8:03

quota gets rejected immediately so now you're like okay what are you supposed to do you we in we escalate to the account manager account manager is like "Oh the quota team is a different team than the account team so I need to do a separate quota." Then they also asked me if I had a support contract which I'm like "Dude I'm on the starter startup plan or Google for

8:26

startups or something." So then we were down for about 17 hours you know that's like not good at all thankfully it was like very few users and stuff but like it's like truly a place where you feel very powerless you know so you always have to hug your ops people because sometimes you go through this right so let's go through this post he's very transparent i really commend him um and

8:54

he's very he's very funny and vocal on a lot of stuff in AI um space so if you don't follow him I'm sure you already do right um but maybe you should if you don't so they had an outage um it's working now but let's see let's go through so we've been working hard to move over to Convex as our data layer and sync engine for T3 Chat convex is a great product this might seem like

9:19

a database swap but it goes much deeper it's effectively a full rewrite of T3 Chat when you do full rewrites like this you have to kind of like uh gauge like which path you're going to go down are you going to do like a full cut over you can do like a progressive change where you have to do dual writing and things like that um most of the time you want

9:39

to do something that's easiest right but the problem is to do like crazy rewrites you have to usually do the one that's the most annoying so I'm not sure which uh option he picked um but you know he's he's doing it so um he had three failed migrations we finally had a successful move around 8:00 p.m last night this is

9:58

our lowest traffic window 40% of our peak traffic that's really cool that he knows that all look good i was pumped a month of effort finally shipped i literally slept for 12 hours woke up to utter chaos dude I know how this feels man you go to sleep thinking that everything is chill then you wake up everything is okay and you don't

10:16

And then you find out let's say you wake up at like 8 a.m right you find out that it started at 2 am so then like and then the like obviously maybe someone's looking at it maybe someone's not but most of the time some of these things just happen and then if you don't have good paging and stuff you'll never know so

10:35

TLDDR spike traffic spike took down their websocket connection layer some bad client code and uh from the React package caused a reconnect loop that effectively dosed the convex endpoint i wonder if it's some use effect thing to be honest uh Convex will have a detailed write up in the future we should take a look at that when that

10:55

comes out um yeah so this Hey I just said that actual status updates and reporting an app right now how is reported via Twitter we have a real app now we shouldn't be doing that 100% dude that's a great takeaway paging system like I was just saying if you don't have paging then you just wake up and it's like six hours of darkness that man people

11:17

probably hating you for for that type of stuff um I hate Pedro Duty too Theo but I think it's the best one given how the uh the competitors have just fallen away uh I think fire hydrant incident io I don't know there's a bunch of different ones but I guess pager duty has lasted the test of time maybe i don't know um chat what are your thoughts on pager duty if you have any um okay a lot of the issues

11:48

we had today caused by bad client side package convex even when we pushed a fix a lot of users were on the old version still and would stay on that version until refreshing yeah that's the classic browser client issue man it sucks please refresh buttons no one reads anything so it's never enough and so you can't really have client side issues you know you have to be a lot more

12:13

um it's got to be a lot more like careful about that so oh man I feel bad for them uh evaluate all upstream providers to make sure they are prepared for T3's chat load very good yeah like this um when you're using other thirdparty projects or products um you just got to make sure that they're ready for prime time when you are um but yeah

12:37

so this is the the I guess the postmortem here or like the the accountability post but you know blameless stuff i think he's he got a lot of heat as always but that's just because he no like the notoriety like it's got a lot of views and stuff but you know hug your hug your ops guy hug your founder who's going through some

12:57

outages like you know this is a community of people they're all doing the same we shouldn't be hating on people for operating our products like operating a product is a it's a job right you don't just get to build software and just leave it around rotting right so yeah that's our first one thank you for that that was cool 64 of you here in the chat i know it's

13:20

boring with me here probably but it's going to get more exciting i have just two more uh news things and we'll get on to some coding uh next one is Osmosis so Osmosis is uh actually written like the company Osmosis is from uh our YC batchmates uh and also he uh Andy from Osmosis was on our live stream teaching

13:53

everyone how to do ML which was cool so let's talk about what Osmosis structure 0.6b is so this is like a smaller model and it acts as a like a model that can just take any unstructured text and then turn it into a proper structured output so like why is this important right so um first SLM is a really cool word that

14:20

no one you really uses right now right it's a small language model um because we're all using LLMs right large language models but small language models are good for specific tasks or when you are very like specialized in what you're trying to do and so in this case this SLM is designed to help create

14:40

JSON or structured output and it's super small but based on his benchmarks they can turn any um unstructured text into a JSON schema why is this important when I was saying that like structured output support really depends on the LLM and so I know a lot of our tooling uses like you know you can do generate object or generate with a uh with a a JSON schema

15:11

or a zod schema if you're using AISDK or if you're using MRA you know we have the output parameter um but then it really depends on the model that um like supports that Right it also comes down and like obviously between LLMs tool calling is a different thing we just released a blog post on like tool called compatibility which really shows you how you have to coers the LLM

15:36

to do this type of work and then taking like structured out unstructured data like this and giving it no kind of like if you just did like a struct unstructured output and then you kind of just gave it a JSON schema it'll do its best right but um it'll miss things and stuff so I would uh try using this it's

16:00

pretty cool we're going to do something release something not we're not going to release an SLM or anything but we have some like application level utilities and benchmarks we can show that you can also achieve this with an LLM sort of but the beauty of SLMs is it's it's a really cheap cost right because they're

16:18

mostly they're open source right you have to kind of run them but you could go through like you know those model providers to do it and you pay like significantly less money to do so so that was cool one thing is from the tool calling compatibility research that we did anthropic you know claude is like the best at tool calling but then the new R3 uh DeepS R3 is amazing at tool

16:44

calling and uh actually why don't we talk about tool calling real quick so all right let's talk about what we did just to toot our own harm um let's share this now so yeah this is a blog post by Daniel Lou from Austra and we had this problem for the like for the existence of the company so far is just everyone was

17:21

having like tool call errors um across different models and so every time we'd get issues in our discord or in GitHub they would be like hey um I'm trying to use this tool but it doesn't work with X model then with MCP coming on the scene that becomes even you know harder because usually when you're testing your MCP server maybe people are just using

17:46

one model maybe they're using Open AI which is generally good at tool calling so like if you're only testing your MCP server with one model then your tools are kind of biased towards working with that model and that's kind of what we discovered like a lot of people were bringing their own MCP uh servers from

18:07

anywhere and you just can't you can never know the quality of an MCP server and you don't really know the quality of it in relation to the model that it was being tested with or the model that you're using yourself right so we had to do something about it because the amount of issues of tool call errors are one it's kind of distracting to like true

18:30

issues because it's not really something that we controlled but it's something that we could do something about and if you can do something about it within a framework it's like the right place to do so um so and thirdly we're going to extract this tool calling magic into a separate package so if you're using other frameworks or libraries maybe you

18:51

could take advantage of it too it'll be called at Mostra Schema i don't know if it's it's a great name I guess um so what does what do we do so first let's talk about the results like before man was kind of shitty like was kind of shitty uh the tool compatibility rate was just erroring and 16% and 11% of your

19:19

um tool calls erroring is a lot you can imagine it's a lot and we did a whole test where we created a bunch of tools we tested them through different like benchmarks of or different models to see like do we get a tool call error or not and like why we got the tool call error so we scoped the problem um here's some really cool things we noticed right in in open AI if you

19:46

didn't support a property uh we would throw an error which was like invalid schema for tool X this is an error that many people see gemini models they don't explicitly fail but they ignore things like the lengths of a string or um if you have some minimum array requirements um and anthropic is chill deepseek and long weren't the best

20:11

at tool usage and maybe sometimes don't even um call the tool and even with prompt co like prompt coaxing it didn't really help either so that was just not good you know um and like many people are using deepseek and like llama models through like fireworks AI so that's cool um the things that seemed within our control were to fix these schema errors and ignore schema constraints when given

20:38

so so like if you want to truly improve tool calling you got to go to the um the model itself so we did different approaches right so first we did the input schema we tried to transform the output make sure it all shaped properly it didn't really work then we tried it it injects like more into the prompt but

21:00

then like it it works for some but not for all right you have to do something that's a little bit more general here so finally is like we just did a bunch of little bit little things like uh we try to do schema in constraints into tool instructions we went a little bit deeper uh we added better tool descriptions we modified schemas um

21:27

based on the model so you know for a given model like 03 uh we kind of changed the formatting of this stuff so it would better know what the the thing is so the user doesn't have to be like on the hook for knowing how to do everything which is the beauty of a you know which is the beauty of a framework so you can see before we're

21:51

just doing string URI and then now like maybe URI is not really like universally like um accepted so then we would just say hey this is a string but in the description like this is how it looks and this works surprisingly well so anyway um that's cool and kind of in line with just improving uh the LLM response so that's

22:20

dope um oh Rude Boy is here i'm going to show this is it 16 of 100 calls to the same tool that are failing or is it some tools work and some tools don't or others don't it's definitely a mixed bag right um and it's based on like the schema and it also based on like what the what you're trying to do in that call too um we'll post the I think you did post I'll link the the the the data

22:48

here that everyone can take a look at um we have this Google sheet if you're interested um and you just look at all the data that we collected here i'll just I don't know if you can actually get it but um it has like all the data i'll share like we can go through it right now um one second and let me share this

23:17

so we have this so we tested pretty much all these models and then we tested like a tool with you know a string property all the different types of like constraints on it if it was like you know in Zod you can do all these things right numbers etc and then we wrote just tools that use leverage these things and we got to see like you know where where things

23:48

fall so we're going to do more stuff work here like this is the first step but we can like definitely see that the number of tool call errors um going down so that's cool and then let's move on to I have two more news i know like news is always boring at least it is to me if you're just joining here 84 of you welcome i'm Abby from Mastra today I'm solo dooo um

24:22

usually I have a partner um and if you're just joining us I'm just going through some AI news we talked about uh T3 chats outage and how we should all be more empathetic to people then we talked about Osmosis's SLM to make structured output uh unstructured text to structured output better and sometimes and this is from Andy using smaller models is

24:48

sometimes a really good way to get more efficiency and using osmosis you can actually like there you know like if you're really into like RL and making smaller models they're a great place to go if you got the money um and you know you can essentially wrap your models and then eventually create a smaller model

25:11

that may that is more trained on your use cases slms are really cool and I think that's going to be a big subject area that comes forward then we talked about tool call compatibility which is like a natural segue and us at MRA we did some research on tool calls compatibility these will all be in the show notes after I'll post those on

25:30

Twitter but uh yeah that's where we're at we got I think one more this one's a funny one that's why I saved it for the end and then we're going to bring in the homies um all right this is funny so I just want to talk about this because it's funny meta plans to replace humans with AI to assess privacy and societal risks now this is not the first time I've heard a

25:56

company doing their trust and safety through a uh or like promising to change their trust and safety protocols based on AI so let's go through this um let's see where did I put it oh it's right here um okay so first of all let's talk about like what is like the like societ like the Okay let's talk about like there are people at Meta and there's a bunch of

26:27

algorithms and stuff too that like handle this but like all the dangerous that's happening on Facebook and Instagram and stuff goes through like a content um enforcement policy also there are people like actual people that review these things um there's a lot of porn just straight up there's just a bunch of porn and bad i mean okay I'll just leave it

26:51

at that there's just a bunch of bad that these content moderators have to go through and then they'll choose to do bands and stuff and I remember my old college uh friend uh Trevor Pottinger he got his first job at Meta back in the day um to essentially write programs that assess if images are inappropriate or not so we used to make fun of him about how many like how much porn he was

27:17

watching at work um but it's kind of funny though um and you can think that this can be assessed with AI i mean that's what they were essentially building which was models to do detection but let's see what where things are going now so um for years Meta launched new features for Instagram WhatsApp you know teams of reviewers evaluate possible risks um

27:42

could it violate users privacy could it harm to minors could it worsen the spread of misleading or toxic content the ironic thing here is they've gotten in so much trouble for this anyway so even with all the reviewers dude they got screwed but anyway they still exist until recently what are known inside meta as privacy and integrity reviews

28:02

that's the stuff I was talking about were conducted almost entirely by human evaluators yes but they had software too that helps them but anyway let's keep let's take it for what it is by now according to internal company documents obtained by NPR who whistleblowed on this up to 90% of all risk assessments will soon be automated in

28:23

practice this means that critical updates to meta's algorithms new safety features blah blah blah blah uh will no longer subject to scrutiny by staffers stasked with debating how a platform change could have unforeseen repercussions or be misused interesting um inside the change is a win for product developers because they can just ship and then like the in theory the AI will protect them um but then how

28:51

many people will lose their jobs i wonder um in so far the the process function functionally means more stuff launching faster with less rigor rigorous scrutiny and opposition it means you're creating higher risks negative externalities of product changes are less likely to be prevented before they start causing problems in

29:09

the world man they're so lax with how they talk about this huh like especially if you like like run like you have like so many users like you could just like tank economies and um anyway um in the statement Meta said the product risk review changes are intended to streamline decision-making adding that human expertise is still being used

29:29

for novel and complex issues and that only lowrisk deductions are being made automated nice they save face here because that's what you should do especially right now i don't think we're there yet to do some full autonomy but internal documents reviewed by NPR show that Meta is considering automating reviews for sensitive areas

29:49

including AI safety youth risk what the and a category known as integrity that encompasses things like violent content and spread of falsehoods dude I just called this this my friend was working on back in the day that's crazy all right um okay now I'm getting skeptical former meta employee engineers are not privacy

30:11

experts slide describing the new process blah blah blah blah blah um AIdriven decision will identify risk areas blah blah blah blah blah dude these guys are super pilledled on AI which is cool but I don't know i guess I'm just a little negative let's just see i don't really care about a lot of this um

30:34

EU could be insulated from these changes probably because there's a lot of digital services acts going around over here all right last i'm just going to do this last one is moving faster to assess risk self-defeating another factor driving the changes to product reviews is a broader years push to tap AI to help the

30:52

company move faster blah blah blah blah we're beginning to see LLMs operating beyond that of human performance for select policy areas i wonder if they just hired bad people though or maybe there's I mean LLMs are that are really good though so this free capacity for our reviewers allow them to prioritize expertise on content that's more likely to violate this is also says this frees up our caps or our our salary

31:19

space because we won't need many left okay um I don't care about this i don't care about this let's see let's just see i think it's fairly irresponsible given the intention of why we exist i just called that uh we provide the human perspective of how things can go wrong now famously Twitter old Twitter had the same type of uh policy board right and

31:48

um sorry there's a bunch of in the chat i'll get get to you um in a second um Twitter also had famously had this same drama right during all the in the US at least i don't even want to talk about all the it's a very touchy subject so let's not even talk about it but Twitter also went through this and they found that the content moderators were not acting in the right way now that's

32:15

okay I guess I mean that's not okay but humans have their own judgments and stuff too so imagine you know AI with the right system prompt can like and it can change right and AI's opinions can change more than a humans i think a lot of us are set in our ways of course um but I just think that is such an interesting article and it's just hilarious that we're back in the same

32:39

kind of conversations again um we've been automating stuff for years right so but now with AI like you know there's opportunity here then people who want to keep their jobs obviously are going to be against it i get it i feel for them and then you know from the executive point of view they're like thinking

32:56

about it from dollars right i don't think they really care about risk and safety to be honest um more money more problems let's go to what you guys were talking about in the chat while I was going through memory lane over there um cool what's uh Chev Hey Siobhan thanks for Hey you're you're active or

33:17

you're a repeat viewer so I'm glad you you're back 103 of you here right now i'm not going to do the recap because I got to do but I will do it later last week I was confused about the Cloud 4 hype and why the official stuff didn't seem impressive okay I was mistaken cloud 4 has 14% score improvement on over Claude 3 that may be true but I

33:40

don't know if it's like I don't know if those uh results are felt they're felt by some but not by many so maybe like some percentage of people are not getting the benefits or I mean I used Cloud 4 last like over the weekend it was pretty pretty sick so but I was also not a believer last week when I first used it so iteration is key

34:06

um I'll just finish with uh Shrevon's uh thing he says "I misread labels on the graph comparing two models." Okay for sure um then rude boy I was CIO of a medical research institute and blocked all inappropriate images then all the researchers reminded me they needed to see medical images which were getting misclassified nice

34:32

um that happens um it's like if you like uh I remember my friend uh when he was doing the the s like the safety algorithms that like people would post food that would get mclassified you know like a hot dog or something but uh or a jellyfish anyway I'll leave it at that um EU really leads with online safety for sure for

34:56

sure okay so 119 of you we're going to go on to our next topic i'm going to bring in my homie um okay let me just send over send this over to him and let's change Did you guys see the uh the countdown on the stream today we're getting better at this streaming stuff you know um and with that hey Tony how do I make this

35:36

how's it going Tony i'm good how are you pretty good pretty good we have about 127 people in here um so welcome just to do a little recap for you because I did promise a recap um we did some AI news we talked about uh GMO's GM made a post about you know uh infrastructure and how you got to be diligent in response to the T3 chat outage we went through that we talked about how we talked about some stories

36:08

that Google us over with the quotas and we talked about how people should be more empathetic to people who operate software um so yeah and then after that we talked about osmosis and the structured output uh model that they made talked about our tool compatibility research that we did and then lastly we talked about how Meta

36:35

is trying to replace their trust or privacy and trust and safety and content moderation with AI and uh it's just made a bunch of jokes about how that is more self-s serving than trying to make things better um and now we're here to talk about agent networks so I'll uh pass off the mic to you like let's kind of just get into it let's educate first

37:02

okay um well let's talk about what agent networks are then I guess to start so uh as far as master private goes um you have agents which are like these individual LM based components right and then you have workflows which lets you create this deterministic flow of actions which may or may not contain

37:24

those agents often they do in fact often several of them and then there are specific control flow checks you can do say okay if this agent uh answered in this or this way then you do something else if it answered in another way then you do again something a little bit different but the control is is fully up to you like you have to pre-program

37:45

everything you have to control the cases it runs only as long as you say that it should run etc now agent networks are kind of the other side of that coin where you have just a group of these different primitives agents um workflows tools and what have you and you want somehow an LLM to figure out which of

38:07

these primitives you should be calling i think there's two main use cases um for this type of interaction one is more like it's kind of a one-off thing where you have some sort of unstructured inputs and then you need to figure out which primitive to call whether it's a workflow or an agent or whatever it is and then it transforms

38:27

the prompt or the input from that unstructured initial input into the thing that you actually need to pass in to that primitive calls it and then you get a result back and then the other use case is like kind of more of a complex task um kind of task solving where you need to call multiple primitives in sequence and the agent needs to figure out which primitives to call in what order and

38:51

when task is actually finished nice let's um let me share my screen and we can draw this out for our our homies here on the screen also I just need to make sure I got my windows properly but everyone see my Excaladra all right so Tony was saying that there's two different kind of paths here right so let's talk about the first one Tony so we let's say we have the

39:18

network so let's actually make this a big ass box let's say we have a network that comprises of what so as agents Mhm on A1 and A1 A2 A3 we can also have workflows so W1 W2 sorry my Scala draw skills are slow um and W3 but then both of these things can have I guess tools right so T1 Mhm t2 Yeah the Asian network itself can also have tools so that's like an additional dimension in there and then we can Oh

40:15

yeah then we can say like a N T1 right but then what about memory so yep so agent network in itself has memory in it um that memory is basically used for what what I kind of call working memory or scratch pad memory whatever you want to call that it's basically memory that is associated with a specific um task only so if you were

40:48

to run one of these more complex tests that would involve let's say at the end of it like 10 10 different primitive calls let's say two workflows and eight agent calls with some prompt enhancements and all kind of stuff in between um the agent network's memory is what is used to record all of those like

41:05

reasons for why a specific tool was called all of those different outputs from the different primitives just so when the routing model needs to again decide what to do next it has all of the context that needs to do so but then in addition all of the agents themselves can have their own memory right so when you create an agent you can pass memory

41:23

to that agent so there there are several several instances of memory at play here make a little bubble thing oops sorry the there it is oh my god let me put M okay we got M here we could have an M here and maybe we don't have M there yeah it is optional so also to illustrate workflow one could comprise of these two agents as steps right

42:00

this could be a tool so this could be like a tool like this that calls an agent sorry the agent calls a tool which is a workflow or it could just call a tool itself yep or a workflow could have a workflow as a step exactly so boom so now we're getting this network structure right um and then even these could all

42:26

be linked as well and then a to a exists so you could do this and you can see that we're in this kind of structure um yeah let's keep going down this graph so you were talking about the scratchpad memory so I guess we can go through the first use case and kind of illustrate how you how that use case interacts with this network yeah so in the first use case the memory is definitely less important because you

43:00

don't really need memory to figure out how to just run one primitive the only context that you have really is the unstructured input and you have all of the different different descriptions and input schemas and what have you whatever metadata we have about each of the primitives that are registered on the network so for that first use case the

43:18

memory is kind of in a way a little bit irrelevant it's more if you wanted to show that in a in a chat interface it becomes more of a traditional vehicle in that um you can have a chat where um you send one message that then executes one primitive um you get some kind of result back you can send another chat message and it then re-executes and then maybe it runs

43:44

the same or some other primitive so the memory is kind of more of like a traditional agent memory in that use case I would say so in our first like let's do like an example like task that a network could do and then we can maybe walk through what's maybe there's like some more components to add within the network

44:02

here right so let's say like what's a good task um well one of my favorite tasks and I don't know if it's a good one to illustrate here per se is one that I tend to tend to need the most myself which is you have some kind of unstructured um metrics or server logs or something i think server log is a pretty good one so you're just getting a whole bunch of logs from some back end or whatever and

44:27

then some of those are maybe errors some of them are warnings and then you kind of want to figure out how serious what's happening uh how serious is it what's actually happening there and then you need to determine some course of action for that um so this is like the data that we're passing to the network with a prompt right yes like figure this out

44:55

not like that but pretty much though okay although in the case of the first example we would pass maybe just one log like one error like maybe we would just filter out errors and pass them onto the agent network and then it would basically figure out how how grave is this error that we see in the logs yeah

45:12

um so then the next thing may would be that okay maybe um maybe it's some kind of a database connection issue which is pretty bad so maybe what it does is it runs a workflow um that takes in who's who's the one deciding that because that's the component missing okay so then the component deciding that is what we call the routing model inside of the a network that's kind of the main driver

45:38

um it takes in the initial inputs it figures out um or it it has in its context all of these different tools available to it it has the memory attached um and it figures out okay what is the thing that best suits whatever is happening here after it figures out what it is they usually pass in if it's let's

45:58

say a workflow it figures out what do you need to do to take whatever the user put in and to actually configure that as the input schema of that workflow if it's an agent it usually creates a more specific prompt for whatever that agent is specialized to do uh and then it calls that private yep so it has this has this like reasoning here to then say based on

46:25

whatever was asked of me execute the right the right playbook or the right move you know which could be calling an agent or it could be executing a workflow or like any mixture or and you know as we saw saw before some of these things are connected right so by doing something it'll also trigger the other components as well yeah it's more like

46:50

it chooses the right registered entry point so like you are in control of which one of your master primitives you want to be registered into this specific agent network so those are the different entry points for actions to start uh whatever that agent does or whatever that workflow does it's it's its own thing but these are like these are those

47:09

primitives that the agent network has the ability to call with whatever tasks you're giving it yeah for sure okay should we talk about the second use case sure so the second use case uh one more thing maybe about this first one is like if we just did one kind of a sample error like it could be let's say a

47:29

database error which seems pretty bad so maybe there's like one workflow that's specialized in uh sending the right Slack message uh creating an incident on pager duty and whatever does all these things maybe use an LLM to create a description and also to estimate like uh like the escalation level of the issue

47:48

so there's a lot that could happen in that workflow um or maybe it's something else where like the error seems kind of random and you don't really know what it is so maybe you just call an agent to like give you a better idea of what's actually happening and then you can follow up with another action yourself afterwards so it could be used in a very

48:07

multi-purpose way yeah because once you get the first response out of this work like this execution you could then be like oops right you could then be like "Oh but how about X you know or like and then that would then trigger maybe these?" Yeah and then just keep talking it's essentially you keep talking to the network in an iterative way but that's where the memory comes in

48:38

in this use case is really through this ability to like use those primitives first to maybe gain more understanding but then also trigger those actions that you know are available to you like trigger maybe a workflow to escalate things if the issue is particularly bad um it's all up to how you actually prompt it like if you prompted it to say run the most appropriate action instead of to give

49:02

you more information then that's what it would do right yeah 100% so like in this behavior of the network you can talk to the network in a Yes in a I don't want to say chat way but you can literally talk to it unstructured text is just chatting right it it's I mean you can chat with it as long as you're calling agents it does really feel like chatting at least at least to an extent maybe a little bit

49:31

less than directly chatting with just one agent but it does it does have very similar experience whenever it does call a workflow it's really more because you prompted it in a way that you wanted some actions happen so it still kind of does feel like that it's more like you just told it to do something and then it did that so yeah um okay so let's talk

49:50

about the next i'll take our diagram here and I'll just copy it and now we're in this and let me get rid of these reset my nice little state transition I was doing live the colors and but hopefully people like it actually one thing um the amount of viewers has just bazillioned up so let me just say what up to everyone real quick um 237 of you legends in here i

50:24

think some other podcast does that um but anyway um welcome if you're just joining us I'm not going to I'm just kidding uh if you're just joining us we're talking about agent networks it's actually a very timely topic in AI in general um because this is the dream right this is what people want when they think about the future of AI is just like some autonomous systems just running based on

50:50

you know you just saying something like unstructured like this but we're going to move on to the next uh kind of layer of this and I'll let Tony kind of explain that while I draw sure so the the second use case really this more like complex task execution where you need multiple primitives executed in sequence uh or in parallel just to do individual pieces of it and

51:19

then probably you need some other tools to then tie everything together at the end of it um now this is where the what I call like scratch pad memory earlier becomes way more important because now instead of looping back to the user to like for the user to prompt something more specific again the routing model is the thing that keeps prompting all these

51:41

different tools whether they're workflows agents uh tools directly um maybe even other agent networks like any of these primitives that you've registered on to this agent network the routing model can and will call them if it deems that appropriate and there is absolutely no human human feedback that you inject into this until the task is actually completed

52:10

nice shall we go through an example for it sure um I mean one kind of I guess obvious maybe even like a little cliche example would be some kind of like um writing of an article or a blog post or something where you have different stages you have like the research stage um especially if you're writing an article um then you write a draft right based on

52:37

that research um then maybe there's some kind of like factchecking involved and then so sorry you research yep a draft let's write a draft fact some kind of facteing and then you revise the draft and then hopefully at the end of it you have like a you know a fully polished ready to go version um there

53:06

there may be research into like different different things happening simultaneously maybe there's different topics you need to research you need to put those together uh but those are like the individual actions actions that typically typically would happen now if you were to ask um an agent network that

53:23

had let's say three different agents registered to it would have a research agent uh or maybe a workflow that combined multiple agents to do research effectively there could be like a fact checker agent and then there could be like a writer I guess or like an author agent that's good at just synthesizing text in

53:44

a specific way with specific guidelines for how you wanted that article to look like and all that stuff um then yeah if you were to ask it to write an article about any given topic let's say about I don't know uh about trains different models of train or something then um it would probably first want to trigger some kind of research right so yeah it goes to the

54:11

routing model and the routing is like hey I need to write something about trains so I got go from the routing model to the research model right or the research that's what it'll probably figure out because it's like okay um I have all these primitives registered to me this specific agent is specialized in

54:31

doing research on whatever so that's probably the place to start so it's going to tell you something about trains then maybe depending on your initial prompt that research is not enough so when the routing agent gets that research it's like okay that's a good start but I want to know more about X Y and Zed like maybe about the history of trains because that wasn't in the initial research or yeah whatever right

54:53

so it could be doing multiple kinds of research either simultaneously or in sequence um if it just decides that the research I got the first time is not enough oh yeah it could do research in parallel potentially yep nice and oh yeah also one thing to note the routing model is the one prompting the agent not the human yes so what happens is you have this

55:17

initial task description which typically ends up being pretty long when you're doing a complex task execution like this um so you're quite specific about what you actually want to accomplish uh maybe you also have some idea of which uh like what you're kind of concluding uh concluding output should look like maybe

55:37

you have some kind of a template or whatever like the the description ends up being quite complex but the routing model doesn't send any of that information to any of these individual agent calls or workflow calls or anything it creates its own prompt based on all the decisions that it's taken before and based on like where it's actually at in the current execution so

55:56

when it prompts the research agent it will probably instead of your whatever your long description of a task is it will probably just ask the research agents hey like can you give me some general information about trains and then once it gets that it maybe figures out other areas that it needs to research and then it will do that

56:13

research as well and again with more specific prompts saying "Hey tell me about the history of of American trains or whatever." Exactly so like this and like we're still waiting here let's say we're the humans technically you wouldn't wait for something like this right it' be a more background type of job yeah we put this

56:33

like write me an article about trains routing model is going back and forth here um just activating the agent and then at some point right it needs to know that it has to go to the next agent right like this one for example or most likely it will go to the author thing um it it depends on how you how

56:56

you set up the instructions for the fact checker but the way I was thinking about it is the fact checker is more specialized in just reading through like paragraph formatted like actual articles and then trying to spot errors um so but it could go either way depending on like how you struct those agents to be called because that that's ultimately what

57:16

determines what the routing model selecting is based on how you describe your agents use cases are so it could be either way yep and so then at some point it needs to be done right it can't go forever so how does it determine that yeah so it it fully determines that based on what is the current execution

57:38

phase where we are at what is like the latest stage of everything given the memory that we have of of this task execution and what the actual task description looks like so in this case if we ask for an article when you just have a whole bunch of research in memory obviously the agent won't be like oh but we don't actually have an article so we

57:56

need to go to the author agent and the author agent then writes something but then the writing model knows that okay we haven't run any factchecking that wasn't part of our initial prompt or our task description that needs to be fully correct whatever so it it knows that it needs to call the fact checker at some point so it probably runs through that at least once yeah um maybe there's

58:16

something that the fact checker agent noticed that needs to be changed in which case it knows to go back to the author again um maybe there wasn't in which case it just decides that okay I guess we're fine there was nothing that arose in facteing we do have a full article so that's where it ends the execution if

58:34

there was maybe it'll run through the author agent again just to do another draft maybe it'll do another fact check and then see if anything else comes up but eventually if the factchecking comes up with nothing it already has an author draft and that's what will be that what what it will then determine to be like that final output of that task execution

58:54

so what's the structure of the memory is there like a template we're using in the in our agent memory here or sorry the A memory or are we using like are we using working memory or are we just storing the responses in the thread so the way that it works right now is that it's based solely on resource ID and and thread ids in the sense that a thread ID is basically or the thread is the

59:20

individual task execution the resources on the agent network um so this is how you can also like go back and look at all the decisions that were made um that's how it's structured right now right so it's just a bunch of messages that were being passed back and forth essentially yeah and and those messages

59:38

are really a combination of two different kinds of messages one being kind of whatever is the final output of any given uh primitive that it runs that's one type of message um and there's actually there's three types total another type is um the actual like routing the routing event or like the routing um routing output where it

1:00:02

basically says why it selected a specific primitive which primitive is selected and what type of primitive it is so you could say something like type agent ID whatever is the ID of that agent and then it would say selection reason which is kind of like an actual human description of why it chose to run that agent at this given point in time

1:00:22

yep selection read and then there's a third kind of event which is like a completeness evaluation event um so after every iteration of after it's after after it's done running any given primitive that it shows run on the previous iteration it always checks what does it think about the completeness of the task given all the stuff that it's done so far and where it is to right now um and for this um it

1:00:52

also gives you kind of this human readable description of does it think that is complete and if so why and if not then why not yeah very cool so what's interesting about this is all this information becomes introspectable as well right since it is like executing steps of a workflow um all of this is recorded um so if you were to want to

1:01:20

like check why any of those decisions were made so that you can maybe fine-tune your prompt or fine-tune some of your agent descriptions or whatever you're totally able to do that um and that information is tied into the traces of patient network execution as well yeah this is exactly the the this is the

1:01:39

dream right this is the dream um that people want but what are the downsides of the loop style because I could I could already foresee there might be some um well one's definitely cost right like um with every primitive that is calling not only are you having to pay for the tokens of that primitive whether it's an agent or a workflow that is calling an

1:02:03

agent for several but you're also paying for the tokens of the routing agent itself um the longer the task the longer the memory so the longer the context window of the routing agent um which again contributes to the cost here um there's of course a lot that we could do like summarizing certain parts of that memory and we do want to probably work

1:02:25

on adding some tooling for that so it becomes more feasible especially like for very large and long tasks but there's there's no way around that it's going to be pretty expensive so this is really reserved for more I don't know for the few and select tasks where um it's definitely worth putting that money in like let's say uh there's some

1:02:47

kind of autom automation um that's just make your team work more effectively this is probably worth it worth it but it's not something that you would replace all of your agents with necessarily yeah and like um this is like more of like a more of like a like a like a statement here is like even if we have this utopian solution working like 100% well do you

1:03:15

think we recommend it all the time though uh no I don't think so um I think I think it is really recommended for these like very few and select use cases where it actually makes a lot of sense um because it is fully automatable and it is kind of even though it's nondeterministic and there's a lot of non-deterministic aspects to it you kind of know that it's it doesn't need to run for like days to succeed for example

1:03:42

like task is still pretty well defined in a way i think those are the use cases where I would probably recommend this the most um and um yeah that's also another consideration for this as well is like the more complex your networks become and your tasks become the more difficult it is to prompt so if you imagine that an agent is hard to create instructions for and it's hard to prompt from time to

1:04:07

time this is now exponentially more difficult because you're having to combine all these different primitive descriptions and you have to write the instructions for the agent network and then your task description is going to be large and complicated because it has to it has to be like transcribable to

1:04:25

these different components that your agent network has y um so whatever task you're using this for um it has to be either very clear what all those different components are so it's more doable to actually like write all the all this information down in set of instructions that actually can work or it should be like still relatively a

1:04:48

well- definfined set of problems that your Asian network can then just solve without you having to necessarily write that workflow because it can be painful um at times so Yeah also is there there's also this other possibility that makes this even wilder which is this one second where you can have a network in

1:05:13

your network yep and so talk about inception right but that's just yet another primitive like any MRA primitive that can be accessed will be accessed so like you could create some really crazy if you want um not that it's recommended but um also like this uh you know Tony we were talking about this once about these

1:05:39

background tasks that people can do so for example like if you get an alert kind of coming back to the the T3 chat thing in a bit or for a bit also you know Tony and I have been in so many firefights both together and individually that a lot of this stuff is like PTSD when you think about all these things but a lot of those times that we

1:06:02

were in those firefights we spent the first time in the research and discovery phase checking all these systems and alerts and graphs and kind of structured and unstructured data to determine like our first hypothesis of what could go wrong right agent network is the the the greatest a really great tool to do something like that if you can figure out the pieces because when given a trace and the right

1:06:36

workflows to go and like check things like oh go check the uptime of this go look at the database CPU usage go look at what when the last release was go do a get bisect on some like you could you could code all those things into tools and into workflows and then you could essentially when you get an alert let's say and if the network runs in an

1:07:01

appropriate time you could come to a problem with a whole deep research of what happened you know which is why I think when you sit in perplexity and you're writing some deep research while you go watch a YouTube video that's a dumb use case you know what I mean in my opinion like no one's gonna sit around and wait for like this all has to happen in the background

1:07:24

and it has to be pushed to you right a network is like a little like a little squad team goes and does some reconnaissance and comes back right um so that's just my opinion on right there yeah I think that's ultimately the the use case that I would have primarily in mind as well um is something that just

1:07:45

kind of takes some kind of like a tedious problem analysis to action kind of a workflow um off my table um whether it's yeah figuring out something in alerts or logs or something or maybe finding TypeScript errors in the codebase and trying to figure out what changed or all kinds of like these problems of something's wrong why is it

1:08:11

wrong can you just go and analyze what's happening and then let me know what I need to do or if there's something that you can do just do it yep we also had this YC batch mate called Harper and they were killing it during YC they're killing it now and they do insurance for companies and people and stuff and you

1:08:32

know like most insurance providers like you fill out a form right but then based on that information the case worker or the underwriter has to go and pull all this data from different places to assess like whether your what your insurance rate will be how much is your co-ay all that right that's another agent network type of task right

1:08:53

if you have all the tools you know you could build products that are you know are essentially form fillers that go and do things you know even like real estate assessment I filled out so many forms about oh like put in your square footage and all this and we'll get you the best deal like today those are all

1:09:13

probably just programs everyone wrote but this could be more dynamic in a in a way so there's more use cases there that are actually real than when people talk about this stuff i don't feel like they have real good like use cases right like if you look at the lang chain supervisor it's like a tell me a joke and tell me a math or

1:09:35

do some math like that's not a real use case because you could do math yourself on a calculator um and and you can always come to me for the jokes like why do you need to use a supervisor agent for that um but anyway Tony can we show some uh code like uh uh just kind of like maybe the internals a little bit and then

1:09:57

we'll we'll bring the other guys on uh yeah let me see um also we have 329 people in here which is absolutely wild um so welcome i'll do a quick recap because I'm supposed to welcome to AI Agents Hour i'm Obby this is Tony we're both from MSRA if you didn't already know probably you do um and then today we talked on AI

1:10:23

news we talked Damn I hate doing this Tony it's so annoying um we should build something that just automatically recaps it for us um okay so what we talked about AI news we talked about T3 chats outage and how you should be more empathetic to people who are operating software applications and how T3 chat is

1:10:42

really cool so you should try it out and all that um and then we also talked about some um this this time that Google us over with the with quotas i'm going to keep saying that in the recap because Google did us over in those quotas um then we talked about Osmosis's SLM small language model that helps you take unstructured text and gets really good results with structured output then we

1:11:05

talked about our tool compatibility research and then lastly we talked about Meta trying to replace 90% of the trust and safety with AI which we thought was very self- serving and probably just for money and now we talked about agent networks as an architecture diagram and then we're going to show some code and if you could uh zoom in like

1:11:26

three times or whatever yeah I was trying to get a good get set up here real quick cool so I just merged uh I just merged main into this branch and I think I broke something so but uh I think it's good enough to view at least right now so um this good in terms of that the font size we're good yeah I think so maybe a little bit one more maybe um there I'll just full

1:12:02

screen so people can see the whole thing yep that looks good no try and get rid of that too okay so as far as the kind of the basis of the the agent network itself um you have the agent network class instance uh which has well the ID and the name but also the instructions of the network itself and then this is what really determines what kind of tasks you

1:12:29

can pass to your agent network um maybe if there's any key points that you might want to encode in terms of um like any completeness conditions or anything uh if you're running like this kind of like looping looping complex task execution case then this would be a good place to write up some of that as well you have the model which is

1:12:50

specifically used for the routing agent um and then you have your different mas for primitives like your agents and workflows and you can have tools here um as well like so and then you have the memory which is again this like scratch pad memory instance used by the agent network itself um whether it is for this

1:13:11

conversational history in you know the first case where we just execute one primitive at a time or the scratch pad memory for this complex execution case that lives throughout that one execution and it will be this memory here now these different primitives here um in this example I have just a couple different agents there's like an article

1:13:30

writing agent and a research agent um they don't have memory so they are mostly just executing pretty small uh small tasks based on whatever whatever you give them in their in their prompts and I have a a workflow that is kind of um doing some more deep research on a specific city um so it's like a very specific research tool that uses these

1:13:52

two agents to kind of put together um a more thorough report on an individual city now as far as this Asian network goes if you were to ask it to do research um let's say on um the biggest cities in France let's say three of them and then to find those cities what it would do interestingly is it would first try um

1:14:16

it would first try to use this agent one here which is like a general research tool to just get only the names like the names of the top three cities in terms of size it would then one by one use this workflow here to do research in depth on all those three cities once it has that it would use the second agent to generate an

1:14:38

article with that content so it kind of knows that okay like for these specific research tasks I have those tools so first of all the only thing that it needs from agent one is just uh the names it doesn't need anything else and then it uses this more specific and more interesting tool to research all those things individually if I were to ask you

1:14:58

about something else though like trains or cars or whatever it would just go directly to agent one and then on to agent two and so this is kind of how the execution flows through this is the programmatic way of doing these more complex like what we call looping use cases um as far as like the single call

1:15:16

use case um programmatically would go it's like the same agent setup uh same network um you could call either generate um if you ask it of a a specific city it's going to call that workflow um if you ask it just about the biggest cities then since it's not a specific city calls agent one instead of the

1:15:38

workflow which again is that research agent uh we can also stream these responses um so here is a case of streaming basically streams exactly the same way as a workflow because the underlying implementation is in fact a workflow um with routing step uh with a routing step being the actual call to the routing agent and then there's another step that either calls an agent

1:16:03

a tool or a workflow or maybe another agent network um and then yeah that's pretty much the programmatic interface um really this will this will mostly be um something that you don't have to use programmatically it's kind of whether you're using use chat or whatever like this just this is the underlying implementation that streams things into

1:16:25

uh into your your playground but uh yeah I think that's it before we dive into the yeah the real stuff that's dope thank you for doing that let's bring in uh some other homies in here so we for our pairing session first we'll bring up the main manio himself that's what we call him talik hi hi what up dude and then from

1:16:58

Strasburg Marvin what up dude hey folks how you doing so little does everyone know these uh these pairing or these live streams there's this one time that we're on live stream and uh this dude was like "Yo this is the most expensive live stream ever." And we were like "Fuck yeah it is." Um but then we also realized to make it not so expensive we should just

1:17:22

do real work the work that we actually do just on live stream and so that's what we're going to do today folks like we're actually going to do master work on live stream regarding agent network which is why we did this whole preamble because we wanted y'all to understand what we're doing because this is what

1:17:38

we're working on together right now so I'll uh give y'all the screen and let's get let's get let's make some moves who's going to share um okay I'll be sharing my screen so 374 of you here thanks for being here no recap for me right now but uh Tffy make sure to zoom in as much as you can and then I'll make this full screen i want to do I do want to educate people on pairing like pairing

1:18:14

sessions i don't know if many people do them in your company i would highly recommend doing pairing sessions one it's really fun two in this age of remote work like um you know it's like the only human interaction you might have uh in a way other than you and your LLM right I don't know but then and then three it's

1:18:34

like really good to learn from others so how we do pairing is we have a driver which is ti like he's the one sharing and then all of us are like um either here for help support like if we need to do like we're agents right if he needs us to do deep research on the side we'll go do it to help him out that's the goal

1:18:52

of our roles here um and then also sometimes I like right now probably I can't do jack so some there's always some dude just watching and that might be me so okay let's get into it okay yeah um I've zoomed in a little more should I zoom in more something okay cool i think that's better but then you just got like your terminal probably has become smaller or

1:19:20

something yeah yeah perfect yeah looking good dude okay um so so far we able to like get messages from network stream and I'm sure Tony has talked about the V2 network and all the next network and now we we're trying to create an interface for you to be able to like use the next network chat with it or

1:19:42

whatever right first on on the network table you see your vinx network with the tag vx same thing we did with workflow like a month ago and now working on this we've not done much but so far let's say we try and chat with say tell me about BMW cars right now we've not really formatted the response but the response

1:20:05

looks something like this is what we're getting from um the stream so like we can see the different um chunks of data that is coming in and the plan now is to build on different types try and create a nice UI for it right and first we can see this where it starts the step and step result this see here it calls the agent tell the agent to research about this and when it is done okay during

1:20:37

that process there's a tool stream with this prompt recept about BMW car including their history models and then another to call yeah there's a bunch of things happening here like have you seen the scroll indicator yeah I see it's like so much change yeah so I was trying to think about what could be a UI on the side like

1:21:11

to show all this information because we have we have a lot mhm so I don't know Topic but if we try to show some cards some details about every single of these instruction we will end up in having a very huge conversation like something way too big that's true so my first intuition was like hey can we just at the

1:21:38

beginning because some people might be interesting about what's going on but some just want to have access to the result right so some people just want to type something when they know it's working and they just want to have a feedback and other people want to debug the thing so they want to have access to a bit more information and to see how the information is flowing uh through

1:21:58

the agents and through the tool calls and etc so I was wondering if we shouldn't start pretty small with the first event which is start I think and when we have a start event we can just show a collapsible button something that will uh open show a disclosure something like this show a big card or just show nothing so that it does not flu the conversation with a lot of data and then

1:22:24

we can think about how we show all the other information in an interactive way so that people can have in between codes real-time data i don't know what you think about that um if we can start small would be great I think yeah let's do it i think that's great yeah I think it's worth noting that looking at these events here like the vast majority of

1:22:44

these events are what are called tool call delta events and those are basically just the LLM the LLM streaming tokens as it's outputting things um so if we were to remove those it would be probably a smaller set but then that'll be like the individual Asian network step execution events and also like we have these rows that have like matching ids right so like the start and stop or start and finish so

1:23:13

those can be collapsed as well you know or maybe have some How about we you know here's an idea what if we start by instead of having three like for example there's three cards for the start and stop right let's have one card that transitions from start to stop for each type you know what I mean okay I'm going yeah

1:23:40

or do you want to go even smaller than that uh Marvin I was just thinking about yeah something even smaller I'm I'm sending you a screenshot so that you And it's an excell you you are making decisions so we can do whatever we want exactly we are free for the user experience of this one for now we can be creative um and so this is what I was having in mind like

1:24:12

just starting with the thinking process thing and then we can expand with a graph like we have in workflows for example where we can see something very interactive depending on the information that we have but just start with this collapsible thing that you can just press and it expands all the logs all the information inside or you can just collapse it and you don't see anything at all uh just yeah but I mean

1:24:37

we can start the way we want let's do that that's tight can we do that let's do it yeah i think do this if I come down to Lucky I'm not the driver today to be honest the stream has a bunch of Excaladra in it dude we're just drawing today but I mean if you have this is just something I was thinking on the side if you have better idea I mean let's go no

1:25:03

no this is great let's do it we'll have to do a lot make a lot of changes here the system message probably have to create a new one but first to be honest I don't know that much about assistant UI and all these kind of things do we have a way to convert one specific message bas based on a type or something to show a UI

1:25:39

yeah that's what I'm thinking about we'll probably have to what we probably have to do is use this to fall back so right here in the runtime provider going to check if I see something here like different tools seem we do in two cop part i mean if we are able to go in a place where we can basically a factory place

1:26:13

like we have an identifier and we can return anything from there that would be super nice but I don't know if it's possible with a stand UI i assume yes but I don't know how we're about to figure it out yeah just like music dj master baby we have to finish this one too at some point true block custom

1:26:52

fallback to do this show the result one show one so yeah what type of information do we get inside this component from for the the two faux bug you get tool call ID you get the all that type of stuff that'll take care of all the tool calls for sure on the page yeah yeah so what I'm thinking is because that's we most likely have to use something like this or we create our the

1:27:46

version for network ch for vet network this different i I've sent a message to Claude which is one of my best friend these days super nice and he's actually telling me that we can have a custom message somehow that renders the user message or the assistant message from assistant UI and we can conditionally render those

1:28:18

we have this markdown text this is a custom component that we using right now for system message okay going to check something at the thread level chrome down to mark down text just shows it for I think what are we in the chat right now like what are we like when we're rendering those JSON objects um where is that in the codebase can we

1:28:54

go there real quick so this is it right now we just stringify the record we get from the stream nice likely here then we have to start formatting the records like And you can can you pass components in there or or this is the data okay okay okay okay i need Yeah but I think in the if you go in the thread.tsx file which is in playground UI assistant UI some

1:29:23

Yeah the thread file i think there is a way to do it if you go you have somewhere where we register user message i think yeah right here if you go in the components object you can add a message one which is an additional one can add a new new line which is message and you can basically return return some JSX from here

1:29:56

just put a hello world up in there yeah just to see if it works cuz compared to user message assistant message I don't know how it will behave I mean just from the top of my head I'm just curious this just so we have an issue at build time what is the issue fetch command service runtime okay yeah I'm

1:30:48

going to remove that yeah can before building can we try something else just before can you go back to the thread thing i just want to see more information since we are running a build so can you add the props in the message uh component that we see here yeah and just log them so that we know what we got

1:31:15

instead of having to run the second build super nice yeah cool thanks man tabs are the best yeah 411 people in here no pressure DIY no pressure just kidding yeah it's like being at the concert but you're not in the in the crowd you're on stage man so we're just building our depths yep i'm excited by the type of uh UI that we can provide with all this

1:32:26

information that we have and and and I was also wondering uh Tony do do you know if it's easy to to add a started at and ended at information uh I believe we already have that in the stream so Okay there's different type of events and I do think that we have that already okay okay cool thanks but uh yeah if not then it should be

1:32:49

doable um we have access to like the the emitter that page basically puts together the whole workflow stream so if we need to do something specific we can always send a custom event but uh let's first check and see what we have yeah sure super nice thanks actually may like if you mean the actual time stamps i'm not sure that we have those but we have those in the the

1:33:20

actual snapshot so if we have the run ID I guess we we could ask for those but let's let's check what the best way to do that is yeah sure i don't think we have anything regarding the message right it's not the right path I think oh there's a props oh no never mind no Tim not yet we don't have it here so

1:33:44

that's not the right way to do this i was Let's replace But let's replace the assistant message right yeah hello world yeah also chat if you guys uh um never mind never mind if you know if you know assistant UI but you know it's very new library so I like those APIs but at the same time it's sometimes a bit hidden what you want to achieve which seems pretty simple but since it's a bit outside of

1:34:30

what they supposed to do at the beginning it's sometimes a bit harder to just at the very beginning wow speed is like 10 10 times faster yeah if you just want the basic or out of the box but I'm pretty sure they have a way but yeah we just need to figure out maybe we should invest in like a story book too for this so we use mock data

1:35:02

we can also we could also I'm gonna write that down for later yeah I think approach is a better one yeah there's a lot of cool things that come out of pairing session chat too it's like you'll find out things that you need to do for the future um you would only do it by feeling the pain you know are we out here

1:35:31

nice okay let's see the logs hello world we don't have any data that's okay though oh wait that's fine that's fine because there is a hook yeah the record hook or there is a use message or use runtime message in assistant UI that we can use do we have access to use message somewhere i know this one i was trying to use it use message we have it in the assistant

1:36:15

message for example she's here you go yeah if you jump on this one we have access to this use message that should have more information let's copy that let me just create a component and render it's easier to So let's see maybe we'll have everything we need and we have our factory or So try Yeah you can directly pass it i mean without having to wrap it you can just like we did with user message in edit

1:38:58

composer you can just put it as the value of Yeah this works too should work too yeah no no false sure this part is going all right let's build it again the pain of building i mean remember the pain of Webpack at scale it was just crazy remember Gatsby time oh Gatsby dev don't even remind me holy that was slow

1:40:24

man with preview it was pretty nice at the end but at the beginning when you Yeah it was pretty slow do you have an issue yeah there's a build it's ting 50 errors there's no anywhere here okay what command did you run build i think I cleared I cleared the previous terminal i should have cancelled it though just recreated to

1:41:44

use i'm going to rename you oh you're exporting the same thing and we export that in a in a root file right so it has the same name or something it's a different name though the file is different but the export's the same name right i guess it doesn't matter if it's noted not too much no you're right

1:42:23

and then next find of trade and now you fix the other guy not that break or how did this change d was in any of these files i just But it's a TSC thing like about anyies right but like what would it what would make it have changed all of a sudden yeah the hell okay okay okay okay need to rebuild core oh did you pull or something the last

1:44:14

core build failed it got terminated suddenly so that's why started to go wrong yeah check out your eye data content okay data content contains it information that we need nice let me just try and get the final data and show first yeah I think we can deal with the uh the LLM streaming last so the the delta delta events I think those will probably

1:46:24

be the last event for us to look at yeah we start like step start or something so this if I just return this first I want to find a way to recognize the final data like exactly how to know what it is so I can pass that as the message and then after that start handling the different types yeah I think there's a very specific event um like there's a start event which starts the whole workflow

1:47:10

execution i think there's a similar one for the very last one yeah i think if I check this the last record finish I may have seen it type finish that's possibly it that's the step one but then the the one below that is just the overall finish event which is like the whole worthless finish execution Yeah this finish does not have anything

1:47:51

other than one ID this step result yeah this everything is type of step result there are so many step results so step result from J step but these record logs are not the ones that are coming from our component right they're coming from somewhere else those are when we set these all coming straight stream okay yeah that's the

1:48:43

actual like workflow streaming so this is like what we want to observe and then we just want to display it somehow so I think as far as the steps go like we know what the step ids are so we could just only look at the the routing step and the actual execution step for this particular view and then just record the

1:49:05

outputs of those so we know what primitive was run and why uh which we can then display separately and then we know the file result as well from the other one so perhaps that's the way to go is first of all filter by type step results but then also filter by just the specific step ids right um I'll just say it's

1:49:43

here the full trunk So each one of these is an assistant message that's just the text is JSON right like just a Yeah the text is cool can we go to that wrapped assistant message then yep i was talking in the in the void i was muted for like three minutes oh man oh dude that happens on pairing sessions too that's go to um

1:50:09

that where we're displaying this assistant thing yeah then let's JSON parse this oh this thing sorry like the the new one that we have created the one with use message well I mean we can try it here if you want yeah which one which one are we actually using right now yeah this one should come from yeah so but we don't want to use markdown text do we or No I think we don't need what's below

1:51:21

i mean yeah we could in a in a sense that if we have other assistant messages we still want them to be shown like this so for example if we since it will just be the new assistant message we will still need to have support for what was ex previously existing mhm um now I think do you think we need this JSON bar part thing because if I remember correctly we

1:51:47

stringified the object at some point to show it in the UI and I'm just wondering if if this one is needed yeah if we want to like see the type for each object password okay because we are using type text right so we're inserting each message as type text assistant with a JSON stringified payload which is I guess fine I think isn't there a type data but

1:52:14

I don't know we'll come back to that let's just roll with this but um that way so then the markdown text component that we're passing in the components object is just taking that content value as text and rendering it that's why we're seeing our JSON stringified right okay okay so I was thinking you know just as a hack

1:52:34

right not for real we could not use markdown on text and we could do something in there um because we are using these message components this is why I mean I think this assistant UI thing has put us in a little bind on this case because we're using this message primitive content so it's just based on the type text and content it'll

1:52:55

just render that component you know yep oh but this is a singular message right like this is this wrapped system message represents a single message there right yeah exactly so we don't have to use this like the message primitive at all if we didn't want to i mean we we'll still need those we'll still need those because we are overwriting the hoverroll assistant message so it corresponds to any message

1:53:30

that is sent by by the assistant and so we are adding capabilities on top of that so it means that we shouldn't remove what's what is existing now but what we can do is put a h if here like if type is start show a collapsible else just show the markdown stuff or whatever yeah let's log that parse content real quick

1:53:58

when you have to do builds so this is just a general tip for the audience because we're in this like right now we're in a like in a situation where we have to build this stuff every time so usually when you have fast iteration loop you can do a log hit save and you're off to the races but now I would

1:54:14

say make as many changes as you can at once and then hit build i don't know if we have incremental build on TypeScript in the project but that might be something to look at if we don't but I'm pretty sure Word has already tackled this let's write that down as well just going to improve that cuz we can use a watch step or something i mean build watch or whatever

1:54:56

what do we I mean I mean just to be honest I would just put if par contender type equals start return a button or something and then we can try to because I'm suspecting that we will have an object tree somehow that we will mhm augment on every iteration that we will feel like a store or something like this or a state that will be filled on

1:55:31

every iteration and this is what will create the chart so this is how I feel about it this is how I see it going because we have access to I don't know that's just a feeling but no no I think so too need a faster way to prove Um yeah let's try simple stuff first and then we can iterate on complex stuff

1:55:58

tony is waiting for the complex stuff to to happen output let's do a tool called delta let's just do step finish or step result yeah step result and I think I will say something for this so step result output results too to call delta so much if it's tool if it's tool called delta let's return null is there a use case where we would like them to be shown up showed up not

1:57:32

showed up yeah not Sure they should update the component in place though right it has the same ID so I mean I think whenever whenever like you're in this oneoff use case where we pick an agent to run the tool called Delta is basically the streamed output of that agent so I think we should just show it as it's coming up

1:57:55

um so I think that's the that's like at least a very valid use case where it doesn't really make sense just to wait for the agent to generate the whole output because we have it streaming back right so of type so something like Yeah for this for this thing we will build UI components in not a very regular way that will be fun yeah it's gonna be super interesting

1:59:03

yeah unfortunately we have only one more minute of the live stream but it's okay we'll be ready this is a good uh See she's almost done yeah can probably just show what's going on almost done almost done okay let's see what we get is it for all the marbles 30 seconds left wow wow wow book no oh man that's hilarious dude undefined

1:59:45

text okay let me confirm something yeah we'll figure this out we'll figure this out yeah okay let's leave it at that sometimes you don't get a victory like we did last time with that music player but timing is timing so let's let me stop your share tile peak so let me stop that now here or there we're all back okay so

2:00:13

there's 525 people in here right now which is dope um and this is the end of the stream i'm gonna do a quick recap and then we're all gonna sayara so today we first of all this AI agents hour if you just showed up and you didn't read now you know um today what we did we went through AI news we talked about T3 chats outage how you should have empathy towards uh people who operate

2:00:39

software companies and products we talked about Osmosis's SLM which takes unstructured text and turns it into structured output um way cheaply then we talked about uh MRA's tool compatibility research that we did and then finally we talked about Facebook's or Meta's commitment to adding AI for their to

2:01:05

replace humans with AI for their privacy and content moderation policies etc which we then made a joke that it's probably not for it's just to save money or make money um then we went through agent networks tony gave us a whole um uh architecture overview of it we drew some Excaladraws that I think people liked and then finally we did a pairing

2:01:28

session because we're trying to take that agent network and visualize it in our playground and so yeah thanks everyone for joining us today we'll see you tomorrow same timeish and maybe say different people though but uh yeah so see you thanks thanks for being

More episodes