Back to all episodes

Google analytics agent, Reinforcement learning algorithms, building an MCP Server, and some AI News

May 20, 2025

Today we talk about AI news such as Jules from Google and the SDK for Claude Code. We talk about an example Google Analytics Agent that Paul from Mastra built, we learn more about LM and Reinforcement learning algorithms from Andy Lyu from Osmosis, and try to build an MCP Server with Mastra.

Guests in this episode

Paul Scanlon

Paul Scanlon

Professor Andy

Professor Andy

Daniel Lew

Daniel Lew

Watch on

YouTube Spotify

Episode Transcript

0:02

hello everyone and welcome to AI Agents Hour i'm Shane this is brought to you by Mastra and as you'll notice I am also without my co-host again today so we'll we have to reach out to him and get on him for you know slacking a little bit but you know he's been in Japan he'll be back I'm hoping tomorrow but we will see obby we kind of miss you but not that

0:25

much but a little bit anyways we're going to talk today about what's happening in AI we're going to meet someone from the MRA team uh a new member of the MRA team that none of you have probably met before we're going to do a segment today called builders learn ML if you are a builder if you consider yourself a builder which is what I would consider myself you know you're building websites you're building apps but you're

0:51

not necessarily from a machine learning background then that's the segment for you we're going to bring in a guest for that someone who knows who has forgotten way more about ML than I'll probably ever learn so we can learn from him and then we're also going to bring on someone else from the master team and together we are going to build an MCP

1:10

server and hopefully you build it and maybe even get it deployed today we'll see how far we get so that's what's on the docket today if you have not already please make sure you are following me on X at SM Thomas 3 also make sure you follow Mastra on X you can also find us on YouTube as well give us a subscribe please and you know if

1:36

you're interested in learning AI agents and you have friends that are interested in learning about AI agents share the show we appreciate that so let's talk a little bit about what's going on in AI yesterday we talked about quite a few things from the Microsoft Build conference so you can check out that from yesterday if you're interested to learn on some of the things we talked

1:57

about we talked about GitHub's new uh code agent that's going to be built in pretty excited about that we talked about uh kind of Microsoft's MCP support which was interesting to me and the CTO is really uh has I think he quoted saying something like MCP's like the protocol basically equated it to what HTTP is for web MCP is for AI and so

2:20

that was really cool buyin from Microsoft think even though I I think a lot of us are maybe thinking that MCP isn't the perfect standard and perfect standards probably don't exist anyways it is nice to have a lot of the big players kind of collaborating at least or or agreeing that this is probably going to be the standard so better to fix an imperfect standard that we can all agree on than

2:44

try to have a whole bunch of competing standards so I think MCP is here to stay and so we're going to build an MCP server later so let's talk about some things that were at least interesting to me throughout uh the day you know since we talked yesterday something that came up on my radar is jewels by Google and I don't think it's necessarily new but

3:07

it's kind of I think getting some definite hype right now around trying to compete with codeex which openai just talked about trying to compete with GitHub's you know code agent that was just announced so Jules is from Google and you know kind of talks through uh some of the things that you can do here if we go down step by step kind of walks

3:29

us through select your GitHub repository and branch write a detailed prompt then Jules is going to go out and fetch your repository clone it to a cloud VM and develop a plan using Gemini 2.5 Pro then it'll pro provide a diff of the changes so you can kind of approve the code edits and then Jules will create a PR of

3:48

the change so you can approve the PR merge it into your branch and publish it so that's it it's very similar to what you get with codeex and GitHub's code agent from what I've seen haven't tried it of course curious if anyone here has if you are in the chat let me know and if you are watching this for the first

4:09

time this is interactive this is in fact live a lot of people ask is this actually live or is this pre-recorded no this is live so if you have uh a comment for instance Editia here hello excited about today's video thanks thanks for uh the comment so you can see Aditia's on watching on LinkedIn but you might be watching on X you might be watching on

4:32

YouTube so leave a comment if you have questions along the way and we'll try to answer most of them another thing that came up on my radar today is uh Anthropics Cloud Code now has an SDK and I thought this was pretty interesting so you can essentially integrate cloud code into your applications using the SDK and this

4:58

makes sense for anthropic you know cloud code has been notorious for burning through lots and lots of tokens so why not let you burn through more tokens in your own applications uh but jokes aside I think this is really cool because now you can kind of take some of the power of cloud code which I've heard from a lot of our customers that I talked to that are building AI applications that

5:18

they really like cloud code better than cursor better than windsurf because they can kind of give it a task right from the CLI and just let it go uh it does it is a little bit more expensive maybe but the price has come down a little bit and it has uh seemingly gets at least in some instances better results from the people that I've talked to but the SDK

5:42

means you can kind of integrate it into your own applications so if you want to build some kind of agent on top of cloud code maybe this becomes more possible now so very cool next up on the list so there is this uh tweet thread that came out a came out a few days ago so it's not necessarily news today but I thought it

6:07

was worth discussing because it was pretty interesting around uh how whether it's whether you should consider something an AI agent or whether it's agentic AI and I think there's a lot of uh questions around what is considered an AI agent and this is kind of small maybe I can make a little bit bigger all right and so there is a you know thanks Elvis for this uh

6:33

thread we're going to read through your thread but it is all around kind of this white paper ai agents versus agentic AI summarizes the distinction between AI agents and agentic AI i think a lot of this is in my opinion you're you're kind of trying to split hairs here but let's just go with it and see what they have to say

6:53

so it tries to provide a comprehensive taxonomy comparing what's considered an AI agent or what is considered a gentic AI so trying to clarify some of the differences so they're saying AI agents are autonomous software programs that perform specific tasks and agentic AI is a system of multiple AI agents

7:11

collaborating to achieve complex goals again not sure that I agree with that but you know they're trying to make definitions the autonomy level is so they're saying AI agents have high autonomy but agentic AI is like a step level above so this is almost like what at Mashra we've been calling like agent networks or it's like

7:30

complex systems ai agents typically handle single specific task agentic AI handles complex multi-step tasks requiring coordination likely between agents I would imagine um so I I like the applications like customer service chat bots and maybe an individual AI agent where supply chain management the entire

7:52

management process is a system of agents so it's an agentic AI and it tries to define what are AI agents and this is a pretty uh tough thing to define because it seems like more and more almost everything's considered an agent what was often just an LLM call yesterday is now considered an agent today and you know my hot take is that whether something is an agent or not is kind of

8:16

a spectrum you have something that you know it's almost like how agentic is it it's like minimal agentic maybe there's a single LM call to much more you know much more agentic is the LLM's making decisions and deciding what tools to call and then even more agentic is you have multi- aent systems and so this is just a different way to kind of split

8:37

that discussion I think but it's like saying agents are single entity systems enhanced with LLMs and external tools i would agree with that that makes sense it you know can work on narrow well- definfined tasks and respond to changes but it defines and this is where it gets a little more broad agentic AI is an architectural shift that involves

8:56

multiple collaborating agents with dynamic task decomposition persistent memory and orchestration layers uh so it uses like robotic swarms as you know potential example here and again I I don't think this is necessarily wrong i I do agree that there's probably some use for some definitions but I I do think we're kind

9:20

of there there's a ton of gray area between what's a single agent and what I would consider agentic AI in this case and this is where like it really starts to break down at least for me is like this application mapping it tries to be pretty specific about what you'd consider you know an individual agent would maybe customer support email

9:39

filtering personalized content recommendation so these are kind of more narrowly defined tasks which makes sense and then agentic AI is like a multi-agent research assistant um you know collaborative medical decision support multi-agent game AI so there's a lot more you know in this and you can see there's the whole paper here

10:02

it is interesting to start having these conversations about what do we consider AI agents versus you know a a network of agents or in in this case what they call you know agentic AI so I thought it was kind of an interesting topic to bring up curious what you all think how do you define an AI agent in this uh in this

10:20

day and age because I think everyone's definition is just a little bit different the next thing I want to talk about today is I want to talk about something that's coming up in a few weeks so AI Engineer World's Fair June 3rd through the 5th in San Francisco so if I'm curious if anyone's who's watching this are you attending are you you know planning on going a lot of uh a

10:47

lot of people are going to be there and one of the cool things is I don't I don't know how I'd actually find it because there's so many speakers but uh our co-founder Sam is going to be speaking he's somewhere on this list i don't know where but a ton of really great high quality speakers it's going to be a really good conference from

11:05

everything that I've heard and I would imagine you're going to learn a lot if you can go so I unfortunately will not be able to be there this year but many pe a few people from the master team will be so please if you do come to the AI engineer worlds fair find Sam he will give you a book you can grab you know

11:24

one of these principles of building AI agents book and learn even more about building AI agents so find us at the conference and we'll get you a book all right and with that I did want to bring in a special guest we're not necessarily done with what's happening in AI yet but I want to bring in him for maybe some commentary so I'm bring in Paul from the MRA team

11:51

so Paul welcome to AI Agents Hour good to see you Amanda hello hey how's it going all right yeah it's going well it's going well um all right so before we introduce you I did want to bring up so I wanted to start doing something on AI agents hour ideally every day because I think it's just a fun thing to do where we basically call call it the open-source project of the day so I'm just going to

12:17

pull up a open source project on GitHub and we're going to try to send them some stars so if you're watching this let's uh start typing in GitHub in your uh in your browser and let's uh let's get send some stars towards a a cool AI project this one is uh some friends of ours from the YC days and they just released this and it's been shooting up the charts as

12:42

far as like number of stars i'm sure it's trending or it will be soon our friends over at browser use they just built this uh and open source this workflow use uh repo on GitHub that already has 2,000 stars i think it's only you know I don't even know how old it is few initial release was 4 days ago so it's shooting up in in like the number of star counts

13:08

but it's essentially a more deterministic way and the reason this is interesting to me is because we've kind of realized this at MRA too a long time ago when we decided we needed workflows we needed to have an agent primitive and a workflow primitive the idea that the agent can make a lot of decisions but if you can make something more

13:25

deterministic it's going to be more reliable and higher quality and so that's why we have workflows as well and so it's really cool to see that browser use is kind of thinking along the same lines as well as like if we can make some things a bit more workflow driven we're going to get better results and so you can have like smaller agents that

13:43

are doing more specific tasks and so let's give uh workflow use a star if you are watching go over to GitHub give our friends a star and while you're there go to go to the master repo and give us a star if you haven't already so any comments on that Paul no I mean it sounds pretty good and I think yeah you know some of those um ideas are you know

14:05

the driving force behind a lot of other things that we see in front front end frameworks and the deterministic approach so I think a lot of these paradigms that cross over between different areas and yeah I do agree with you like breaking it down you can have agents that do specific tasks and bring them all together um you will probably

14:23

yield a better result yeah absolutely if you're in the chat if you are listening what please suggest some uh other open source AI related projects if you would like to and we'll grab one to feature tomorrow on the live stream we do this every day of the week Monday through Friday right around noon Pacific time give or take a few minutes and we

14:46

usually go for an hour sometimes we go a little longer we'll see how long we go today but with that Paul let's get a little introduction so you're pretty new to the MRA team but can you tell me a little bit about your background what you were doing before and what brought you to MRA yeah um so what we on now day four at

15:07

Mastra um background goes back to 2005 as a Flash developer that's where it kind of all started and then when Apple decided not to support the Flash player I kind of switched over into JavaScript land um since then I've worked various contract roles React developer um more recently uh as you know I worked at Gatsby as Devril moved on to Cockroach

15:33

DB as Devril and then Neon as Devril and then more recently uh technical product marketing at Neon um and then I just had to get involved with Master it's just too exciting with what you're doing uh the growth has been incredible um and it's really great to be back with some of the old team yeah it is kind of funny it's it's like a a a mini Gatsby reunion

15:56

in a lot of ways a lot of people that at least intersected with you know people at Gatsby are are quite a few of them are on the Master team but it's fun because you're working with people you know and you trust and you know that are you know going to really help move uh move MRA forward in a lot of really good ways sure and I suppose maybe one thing

16:17

I didn't mention there is specifically what I'll be doing at Mastra um which is the docs that's that's my primary focus uh while I'm here um there's a lot of things that are moving and changing pretty regularly and we just want to make sure that the documentation is as up to date as it can be so that it's as easy to get started and find your way

16:37

around uh so that will be what I mainly focus on i'll be in and around the Discord as well so if you do have any questions uh feel free to tag me username is Paulie Scandalon um yeah look forward to seeing you yes so So what Paul's saying don't at me at Paul no I'm kidding i'm most I'm I'm kidding but also it's I I think one of the things around building like an ecosystem

17:02

around a project like MRA is the docs are really its own product because without really good docs that are consistently kept up to date and very clear you just you can't really build a good community around a a project like MRA i think so many times you have like you have very promising projects that

17:22

the docs are just not kept up to date or kind of an afterthought and that really really does kind of hurt long term because people can't people are excited about trying it but then they really get in the weeds and they get a little further along outside the basic quick start and then they start to run into problems and so um I I do think that our

17:41

docs are pretty good but I'm pretty consistently disappointed in in ways that I know we could do better and so that's one of the exciting things of having you on board Paul is to have someone that is really dedicated towards trying to make our docs better every day and excited to see what you what you're

17:59

able to do over the next couple weeks yeah yeah and I mean I would agree like the docs are pretty solid i think it's more just a case of so much has changed and so much moves and you know it's a very fast pace it's easy for little things to slip through if someone's not 100% focused on it so um yeah I won't

18:17

let you down Shane i'm going to stay on it well I appreciate it and if you're listening and you see things wrong with the docs we always accept PRs too you know the docs are in the in the repo so if you see things that you know little things if it's bigger let's let's tag Paul in and let's let's come up with a a really good solution but if you notice

18:36

small issues we we do accept PRs all right so let's I I think we wanted to maybe go through as you've been kind of on boarding you've been building more with MRA and we wanted to talk through a project that you had started and it might be a good learning opportunity for those others that are interested in building AI agents for the first time or just seeing you know someone else go through the process of

18:58

building an AI agent for the first time yeah um do you want me to share screen and I can give you a little bit of demo and then we can talk through it yeah can you give us give us the 20 second context then yeah let's let's share screen and let's let's see what you have okay let me see if I can figure out how

19:15

to do this entire screen now I'm only on one screen so I can't see you can you see my screen it's It's looking good we might need one one bounce of zoom one click of zoom one click of zoom okay uh so this was actually part of my onboarding but also I wanted to get a last minute submission into the hackathon so I didn't spend an awful lot of time on this but I figured I'll just give you a quick demo and then

19:41

I can explain a little bit more about it um I hop over to the master playground so this is a Google Analytics agent so I've hooked it up with my Google Analytics and I want to be able to ask my Google Analytics questions so a simple question would be do you know how I know you used to be in marketing

20:01

because you built because you built a Google Analytics agent yeah yeah that's fair enough that's you know that's actually partly the reason uh so I'm going to say "Show me page views from the last 30 days." And my agent should go out it's got an agent and a tool i don't know why that gets called three times maybe that's a little something we can look

20:25

into uh here you go i get my page views for my site paulley.dev uh lists the top viewed pages and um can you give me one more click of zoom all right is that any better yeah that's better okay um I can't make can't really make any of this go away can I no yeah if you click the sidebar it should go away and the right on the in the middle of the the two the left sidebar and the

20:52

if you scroll go a little more right a little more right right there oh little left there it is there we go it's hidden that's an Easter egg I guess because there's not very clear that that exists yeah um so yeah it gives me the URLs and a page count um gives me the page titles and a page count i also get top referers like where the traffic's coming from um it was at

21:20

some point also giving me countries and cities um but I guess we can we can dig into that um so that's the agent and then I guess if I show you the setup um so no storage and we're going to need some zoom on that as well a couple clicks there you go one more there we go wow that's like jumbo on my screen yeah you know people you know people like to be able to read here

21:52

it is it's always It's always like one I always say one click more than you're comfortable with yeah it's giant for me but uh I can see this room across the room but good i'm glad you can see it um yeah it's a really simple setup so you know I've got no I don't I don't know if you're telling me I'm getting old and maybe I need glasses or I I like to think that I'm just speaking for the you

22:12

know the others in the chat that are watching this but yeah I needed I needed the I'll be honest I needed the extra click so yeah that's fine can you see So can you see okay now I can see it hopefully all you all in the chat uh you can see it too um yeah so you know very simple it's just an agent there's nothing else um and here's the Oh actually here's the

22:35

agent which is my prompt um I basically started with the weather agent from the docs and just kind of adapted it so I liked the way that prompt was written you know it kind of explains um you know what the agent is and what its tasks are um I did introduce uh a little function here just to make sure that the date was correct for some strange reason it kept thinking the date was uh I think

23:04

it kept thinking the date was the first date from the Google Analytics response which was in 2023 so I had to be sure that today's date is today um I'll be honest this is a problem that I've seen many times when you're dealing with LLMs because it it doesn't always you would think that it would they would pass in the context of the date automatically but you know think for caching reasons and such they don't

23:28

necessarily always do that so I' I've seen many many examples where people you know do need to pass in the date and I'm not actually sure i do think that you know in some cases like the way prompt caching works I think if you pass in the date towards the end you can actually still cache parts of the prompt at the beginning but we're not going to worry

23:45

about that today pass it in wherever and uh yeah that seems to work having it right at the top um I mean I could maybe try it another way no I think I think that I think it's good that that's the probably gives you the best results is having the the context at the top um and then yeah really just kind of followed what was in the the the weather app um

24:06

example so it explains uh which tool to use um I've tried to explain that it's only to fetch data once per request uh it depend it did depend sometimes on the query it was trying to hit the API multiple times and I was actually getting rate limits from the Google Analytics API um in the example you showed it still looked like it did call

24:28

it three times yeah so perhaps this isn't actually working as well as it needs to um maybe we can look at that yeah um then there's some specifics here now this I guess this is kind of interesting because the Google Analytics API kind of has its own I wouldn't say query language but it you know it has these shorthand methods 7 days ago 14 days ago 13 days ago and that that's to make when

24:53

you're writing an API call a little bit easier than having like dates passing in dates um but it only does that up to 30 days so for anything over 30 days you do need to give it a date and I tried to be explicit here and say look if it's a date then it needs to be in this format otherwise the Google Analytics API won't

25:12

work um this I'm not sure about i don't think the Google Analytics API does um have a maximum of 30 days i think it can go back longer in time but I'll I'll explain a little bit about the issues I ran with that shortly um similar thing with the limit i only wanted to get 10 results back it obviously can do a lot more and a lot of this is actually I think related to

25:39

token usage um because the Google Analytics response can actually be pretty big and that I was finding I was running out of I don't know what it was i was just getting errors that the response was too large um and then I guess this is more just how I would like the response to come back and um if there's anything missing

26:01

I want to prompt the user um there's things here like if you want to query by city or country uh that's just to make sure that it's actually getting the right information um and then these are specific to the dimensions in the Google Analytics API again I'll I'll talk you through the code of that in a minute but page URL page title page referers and cities are actually all dimensions you

26:26

can use to query um yeah there was at some point when it was returning country names and city names it was also returning a flag which was quite nice in the response to see a list of um you know top countries or top cities that have visited the site um then just some additional information to include in the response so just to explain what the date range was um and

26:53

then the total views for east dimension um I did need to strip this from uh the response for the page titles that's just something that I include in my site and it ends up in Google Analytics which kind of makes the response look a little messy um using OpenAI GPT uh for Mini i haven't tried any others uh again I just kind of ran with what was in the getting

27:21

started guide but you know I think for a 2-hour job it it worked and I was pretty amazed that it actually worked as well as it did first time um yeah where things got a little bit wild however were the were the tool where I'm actually making the API call yeah and before before we go on let's there got we got a couple questions in the chat so let's Sorry I can't see so yeah do you

27:44

want to hit there's a couple questions we'll we will uh highlight here so one question from YouTube uh all right from Jamal how do you and this is I think previously before we even kind of dove too far into the code but how do you use a debugger in MRA i managed to pause on breakpoints from the MRA build folder i wonder how I can pause on break points from my master source folder looks like he's using VS

28:09

Code and Jamal I don't know that answer to be honest with you but I uh can definitely I I would actually jump in if you're part of the Master Discord i think you should ask it there i think you'll probably get a much quicker response i honestly haven't used the debugging in VS Code for a while um but if you're able to do it in the build output then I imagine there's something

28:33

to do with I know because there's like the compile step of going from like a build in Typescript to like the compile JavaScript code i think in a lot of cases even when I I just remember back in the days when I was debugging Gatsby I would often have to debug the JavaScript version not the TypeScript

28:52

version and so I don't know if there's there's probably better solutions than that i don't know if I know the answer though so unfortunately I can't can't help you but I bet someone on our team in Discord might be able to at least tell you if it's possible or if not uh then there's another question from Aditia uh from LinkedIn should the

29:11

prompt be written in a fixed way for better output or just a hit and trial method i'm assuming like a trial and error type method so there are a ton of different guides available on just prompting in general and structuring of prompts it does vary a little bit per model different models respond to things in different ways there's different ways to structure prompts i believe we uh if you

29:35

go to the master blog we do you know in this you know in Sam's book we do have a section on prompting i think we released that prompting on in a recent blog post let me see if I can find it here and that might be a good getting started but there's also Yeah there's a let me share a screen here uh let's see so there's recent blog

30:08

post if you go to the master blog post on May 13th which just has some tips but again this isn't necessarily I wouldn't say this is the complete guide there are some much more detailed even much longer guides for how to think about prompting and the other thing that I would just mention is it does change quite a bit

30:26

because as the models get better the techniques for how you prompt those models uh do tend to change so hopefully that's helpful uh there is a lot of trial I would say a lot of trial and error though as well with it in depending on what models you're using all right Paul we're back at it yeah yeah yeah I mean one thing I maybe would

30:45

add to the prompt um question this is something I've been doing is actually using chat GPT to help with the prompt so once I get something to this sort of stage I will run it through chat GPT and just be like is this clear is you know and see what it says see if there's any improvement so it's kind of it's a bit

31:03

meta that you're using a prompt to get a prompt you want to see something cool that I bet you didn't even know existed uh sure yeah always go to the playground once and over there in instructions yeah I think I know yeah so you you can actually you can add a comment and enhance and we're actually there's a new

31:23

design of this coming out or maybe if you updated it you'd have it so it's a little bit clearer what's what it's doing but we don't need to Oh I get You're being real brave let's just see what happens just telling it to enhance no context nothing um if I save it and then if we run that query again should we just see what happens yeah I mean you can save it try

31:47

it let's do it so if I do um So I think you need on the right you need to select version two and click the play button on version two so you're saying that's the one you're going to that's the active one okay so show me page views from that last day and uh eventually we'll have this like right to your file system right now it's just like a cached um oh uh let's see page title and I

32:17

don't think this is going to work because I don't think you have memory so we can add memory that's another thing we could add but otherwise maybe we can go back and you're your the other prompt was this is the trial and error uh edit yeah yes there's a lot of trial and error if you click on expand on one of those tools can we look at what the results are and if you look at the other is it

32:57

the same if you Yeah if you Are they all the same key title was that Oh so it's calling it with different keys that's interesting oh do you know what this might actually be i was going to maybe take I'm going to take you through the tool in a minute but um yeah maybe it's doing it for each H well let's have a look

33:26

so here's the tool um I guess we can well let's just skip that bit and go straight here so that that's your tool that you know again very similar to the weather agent actually it just accepts um two inputs uh the range of the query and then the key and the key is what I've defined up here as and these are actually the dimensions that I use to query Google Analytics so I wanted to somewhat restrict it so that

33:58

I'm not trying to query for say um device which you know I see so so that's why it's making three calls is because it's doing URL title and refer right well it's calling the tool three times what what how does this tool return different results if if that key is passed in well the key I mean the key should just be one of the keys um I think it's maybe

34:27

where it's going wrong is because I'm saying well no it shouldn't i mean if the prompt was show me page views from the last 30 days i mean it that's where it kind of asks you what do you want like the title URL or referral i don't think it should kick it into calling three times should it or should or is it

34:47

or is that actually the problem i I I think it's thinking that it needs to call because you in you have the key in the input schema of like so you're saying what key do you want and I'm I think you're saying you you always want just those keys return so I almost think that you could just we could just write that in the git analytics

35:13

um function to just return those keys but not have it be part of the input schema yeah maybe i mean I think what I was trying to do was ensure that when I make the actual query to Google Analytics it's only going to be based on a number of things that I've already hardcoded do you know what I If I for instance if if my prompt was show me um device or browser usage from the

35:43

last 30 days like the actual tool itself isn't querying Google Analytics for that data so you wouldn't get a response so I think what I was trying to do is that if if something like that appeared in the prompt then it just wouldn't query it or it would you know it would ask you to refine based on the keys that it's allowed to query for does that make

36:04

sense yeah I guess why maybe not i'm not I'm not following so so you for instance okay maybe this will help clear it up then so if I were to add another dimension to Google Analytics for instance like um I think browser is browser one yeah so browser so in order to get information back about the browser you would have to add a dimension here right so if if I

36:44

wanted to also do browser I could do that but if I didn't have that if if we go Yeah if we go up though then okay so wait to the get just the get analytics function because you're passing in the key which is you know one of those values and then how are we using that key if you scroll down scroll down a little more you I think I saw some somewhere

37:17

down here we're using that key right good question where is that key it's it's down yeah I think if you go down a little bit further I saw something oh that's to reduce rows so only only show rows that have that key is that what it is is that what Yeah and the reason for that is um so as I think I mentioned at the beginning like I was seeing instances

37:42

where if I queried for longer than 30 days I was getting this giant response from Google Analytics and that was causing errors with like I don't know what it was like token usage max size or something like that so my idea here was rather than have the agent try and aggregate totals or values I just reduce them so that the actual response coming

38:06

back from the API is smaller but I guess what we could do just remove that all together and just return the rows and then yeah then don't even have keys get passed in and remove that from the input schema and then to be honest I probably don't even need Yeah I think we just get rid of it and then I think that I think

38:33

that'll fix our getting called multiple times problem and get rid of that and let's just get rid of this for now and all of that makes it a little bit easier and then now I'm sure we'll run into whatever problem you just had which is ton of data yeah we'll see let's see what happens but if that is the case we

39:01

could always bring that back but just hardcode what values we care about and if it doesn't have one of those properties then we don't reduce or we don't include it now it's calling it four times we're making progress but the wrong way yeah i mean I have a feeling this would be Let me look at what one of the Yeah so this without that kind of reduce i

39:33

don't know whether it's it's either the reduce or it's the Yeah so it's it's overflowing the context so yeah so you must reduce down to it must been returning a bunch of data so if we bring that back though and we just what what if we just said we're only going to return data that has a URL you know what I mean so we just like hardcode the the key just to test this

39:58

thing out so So in that in reduced rows if we just put like URL or title or whatever the one Yeah url and then we got we still got rid of the key up top you know what I mean like so it's not getting passed in to the input schema uh where's that here so get rid of that and then yeah I get rid of it on line six as well yeah

40:26

just because then it will only return things that do have the URL which is what the seemed like it was always calling anyways okay show me base views for last days oh well done Shane there we go we did it yeah and now that's good um and I guess that would still see the problem then becomes if I wanted say city so for instance I couldn't do show me the cities for

41:11

uh page views last days i don't think that's going to be able to return anything helpful is it oh it still does so it's more useful than it was before Shane by removing functionality funny how that works i think my idea was uh that I wanted to make this kind of reduce function dynamic so that I can reduce

41:43

by you know any number of um I guess any number of the bits of information that come back because you'll see in in like a response you know I might want to group by the city or the country rather than by the URL um then again you know if it's a page view a page view is always going to have been triggered by a URL so

42:09

it would have country and city and a referer associated with it yeah well I mean it's good yeah we made I mean we made it better we we cleaned up some code we looked at how it worked and you know now now everyone everyone knows how to uh build a Google Analytics agent so for all of you marketers who like to

42:36

write TypeScript you're ready to go hook this thing up and build yourself a a Google Analytics agent yeah cool and let me know how you get on and see if and give me some tips yeah absolutely well Paul uh this was awesome thanks for coming on we will definitely have you come on again and I think we have some ideas of some other things we can build so maybe you should come back next week and we can go a little deeper

43:07

into into the Maestro world and build some some cool agents sure all right thanks for having me nice to see everyone see you later all right see you Paul okay everybody if you are just tuning in we are doing this live so you're are watching AI Agents Hour we're 43 minutes in and we're still going strong if you are watching this on YouTube or LinkedIn or X feel free to

43:33

just drop a comment we will see them along the way you know we have someone suggesting some projects to maybe uh consider for what uh what to review for open source project of the day tomorrow some strong contenders but if you have comments leave them we'll try to answer as many as we can next up we're going to talk through uh something that I think I will

43:59

appreciate hopefully you all will as well i would consider myself a builder but I am not I do not have a machine learning background however my friend Andy is much smarter than me when it comes to machine learning and he's going to teach us some things so I'm gonna bring Andy in and we are going to uh

44:18

jump into something I like to call builders learn ML andy what's up hey what's up not much good to see you it's been a while since we we talked last yeah yeah it's been a couple of weeks right yeah yeah we had the the YC uh kind of group section and you weren't able to make it unfortunately so it seems like you

44:37

weren't feeling well but I'm glad feeling better now hopefully there was a wave of illness going around the Bay Area and I I got hit unfortunately but I'm all back now yeah well it's good to see you and excited to have you on today to teach us all something so jokingly I do refer to Andy as Professor Andy so because he has taught me a lot in many uh honestly just

45:02

some short conversations and so hopefully he can all teach us some things but maybe it'd be useful Andy can you give people just a little background on first you and then you know kind of what you're doing yeah sure so I'm Andy i'm currently building is a product called Osmosis is a platform that will help you train your AI agents on some certain task so if you have an AI agent that's doing web

45:25

browsing or AI agent doing tool calling uh you may find out that with prompt engineering it's kind of it'll go only as far as the context you provided but if it's reasoning about something that's incorrect or undesirable then there's not really much you can do but we think that this should change and we believe that the future of agents will be one

45:45

big really complex reasoning model with many many small agents on on open source models performing their own task right so we are training these small models to be really really good at one specific task and then scaling it to potentially hundreds and thousands of agents working together that way yeah that's very cool can you what about a little bit about your background before what were you doing before this

46:10

so I used to be a tech lead at Tik Tok i used to be a lead for the recommendation system on the data infrastructure side so mostly the data infra powering stuff like the for you for you page as well as live e-commerce um the challenges to that is really twofold you first have the ML background that's required

46:29

because essentially recommendations is a big big ML system um the second part is data infrastructure right because at any given time you have actually hundreds of millions of requests that's coming in every single second how do you process them how do you dedicate GPU to the correct requests and how do you serve all of them in a very very low latency

46:48

right because Tik Tok as a product you're scrolling videos every few seconds so how do you serve that at scale was also one of the challenges that I had to deal with yeah and I also heard uh I don't know if this is a rumor or if this is true but you were either the youngest or one of the youngest tech leads ever at Tik Tok is that right yes

47:06

yes uh I don't know if the record has been renewed now but yeah okay well yeah so as you can see this is one of the reasons I brought Andy on he's wicked smart and he is going to teach us some things about um machine learning did you have things that you wanted to talk through today you know talk through with

47:25

the audience of course this is interactive so if you have questions chat just leave us a comment we'll try to answer some of them but Andy what did you want to talk about today yeah I can just go kind of go over the the amount of uh post-training methods people use right now to improve AI models um I think probably everybody here is

47:44

familiar with supervised fine-tuning um the idea of you're giving the model a a ideal input and output and then you want the model to mimic that behavior right i can talk a little bit about why that works why that doesn't and the more recent advances that we've gotten um especially with the release of models like DeepS and Qin um how they have improved on just by uh from the SFT

48:09

stage or supervised fine tuning stage to more and more advanced algorithms to help models improve yeah i mean can you give people just the So there's a lot of people that are getting into AI that don't have any kind of background at all in ML so can you give us the short you know explain it like I'm five example of what is uh

48:29

supervised fine-tuning sure so this goes back to how a language model works in the first place so the language model works as a probability predictor what that means is if I give the language model a sentence like today's weather is the language model will then try to predict what is the next word that will most likely come out of that sentence right so it could be nice it could be bad it could be great

48:54

um that's how the computation happens and that's how the next token is chosen now with supervised fine-tuning is that if we want the model to always say the weather is good then we will take the probability of predicting good the word after the weather is and then try to maximize that right so this becomes more and more complicated uh once we have

49:17

additional um additional data like for example if we have a massive paragraph or a lot of Q&A sessions um we want to push the model into giving you the highest probability to get to the ideal like response that the model will give you right so this makes sense in in theory because if you're telling the

49:41

model a lot of questions and then you're asking the model to train itself so that it will answer in the correct way supposedly this behavior would then feed back into the other questions that you have not asked in the training set but one of the problems with supervised fine-tuning is that this technique is quite brittle right you can imagine as I'm just telling the model what the

50:05

correct answer is but I'm not really telling it why and I'm not really teaching you how to get there right i'm only telling you hey if I ask you A you you respond with B right so if I if I'm asking you C maybe you'll respond with D but if I'm asking you like oh one two three then you have no idea what to do

50:22

right because you don't have that intrinsic reasoning capability um that's why supervised fine-tuning was was kind of the first stage in post- training but then there's many more newer algorithms that came out to uh make this whole process much better awesome and I think next is okay what are some of those newer algorithms that people are are starting to do in their

50:47

use in their post- training process yeah another one that uh people have been using is called DPLO the full name is called direct policy optimization um this process comes in the idea that since in SFT we are always giving a good examples of the what the model should be doing but we're actually not giving the the model bad examples right like if

51:09

we're let's say for um like let's say for uh emotion detection right if I'm only telling the model um hey how are you feeling today i feel great how are you feeling today i feel fine the model is always going to be pretty positive right and emotions but you also want to say like oh if the model for this

51:30

specific um input text is saying that oh I feel great but it's actually the correct answer is this user is actually feeling frustrated or feeling negative then you should penalize the model if it does try to say I feel great right so direct policy optimization introduces the ability to not only have positive

51:49

examples but also negative examples right and the way that is trained is that it will take the negative examples it will take a look at um the probability of predicting the next negative token and try to minimize that right so you're both minimizing the probability of a bad token from being predicted and you're trying to maximize the probability of a good token being predicted so this is how you get

52:13

initially a opinionated model more than just somebody that just knows how to respond um very generically and systematically to some question okay I'm I'm learning some things i I'm learning some things today because I had no idea what DPO that that's the right acronym DPO dpo i have not heard of DPO uh chat have you have you heard of DPO do you have any questions for Andy so if you have any questions please leave them

52:40

in the comments okay so we we got you know supervised fine-tuning we have DPO are there other things that are like common in in the post- training process the next one that I'll talk about is called PO so PO is essentially a way of reinforcement learning um and the idea is the full name of it is called

53:05

proximal policy optimization now what PO does um this is actually the one of the algorithms that was used to post train open AI GPT models and I think it's the algorithm powering the open AI reinforcement fine-tuning platform uh you have actually I believe it is four three or four LLMs that actually exist in this

53:28

whole training setup um here's how it works in a nutshell you have the model that you want to train or the model that that will be trained to do some like tasks that you want to do that's called a policy model so the policy model will then be asked to do the task that you want to train it on like for example get today's weather help me book a flight

53:48

etc right uh the policy model will then make its own uh predictions make its own judgments and then once it's done and completed with that um with with the uh training run it will then ask a reward model a reward model can be uh it can be either a very simple machine learning model or it can be another LLM this model's job is to grade how well this

54:15

single run is uh by looking at all the tokens that is predicted and saying that yes this is a good prediction no this is not a good prediction and then what is it so what's the output of that re reward model is that just like a yes or a no a zero or one or is there some kind of grade that it is that some kind of

54:33

scale or does it yeah there is a there's kind of um implementations may differ but you can think of it as like like a floating number like between some range okay this will tell you how well it's doing right and then based on how well it's doing um it will also um there's there's another model it's called a checkpoint model uh you can think of

54:53

like when your policy model is uh being trained the initial version of that policy model is cloned into what's called a reference model so essentially you have two m two models at the same time doing the exact same task one model that's being trained and one model that stays static the reference model is

55:14

there to kind of hold back the policy model the idea is because you're always uh because the reward model may actually push the policy model into many unknown areas and many unknown directions you want to kind of hold it back by keeping its behavior in check right so an example of that I guess it would

55:34

be uh if you keep on training on like what is the weather today in San Francisco and if you don't have a reference model the policy model might go off in all different directions right it may start looking at the food food items in San Francisco it might look at the restaurants there it might look at um the geographical data it it may do a

55:52

lot of unreasonable things and the reference model is kind of there to say like hey um don't do that do keep on track we still want to get the correct question right you may notice that we are actually also relying on the reference model to have some capabilities here and this is why PO is only valid in post training right

56:15

because when you when you're post training a model this model already has some uh some ability to reason to answer questions and to follow instructions but if your model has no ability to do any of these things then PPL will not be helpful because your model is outputting gibberish there's nothing to optimize towards so okay now I have questions so

56:35

the the policy model is the model that's getting trained mhm and the reference model is is is like the the model that was trained in like the pre-training process but is not currently being trained but it's so it's kind of like using itself is that am I understanding that correctly yes exactly okay yeah so that's the that's the TLDDR on

56:59

PO and then you may realize that hey we're only generating one output and we're only grading on one output can we do multiple and it seems really expensive right you have a reference model you have a reward model those are two LLMs lm is really big right so they fit on GPU you see scale them again that

57:18

seems very clunky and that's where another algorithm called GRPO comes in right so GRPO can be seen as an extension on top of PPO it's made popular by deepseek uh because that was the main algorithm that was used to first train Deepseek's math reasoning model and then later on to train the Deepseek R1 um the model that we all know so what JRPL has done is decided to

57:43

say hey I don't actually need a reward model i can very quickly verify the correctness of a solution based on some rules right so the rules that DeepC used were two rules one of them is are you following the format I gave you when I ask you to answer a question so this kind of ties into the whole instruction following ability of the model and then the second one is to say

58:09

um did you get the correct answer right so the correct answer is the reasoning ability of the model um so so so you're telling me that deep sequence trained basically saying did you follow the did you follow the instructions or did you do you the right format and did you get the answer right and that that was the that's the behind the the post training

58:30

yes that was how what was used to train um deepseat zero yeah and then uh there's actually some interesting behaviors that emerged from training deepseek zero uh just by doing that i think the exact policy was you must think in the thinking tokens and then you must answer your your the question in like the answer tokens right so that's how you get the idea of test time

58:54

compute scaling is that um the model learn by outputting more tokens it has a higher probability of sampling to the correct answer now these models are actually trained on multiple languages so not just English it could be English Chinese or you name it right there's hundreds of languages being used train um and what Deep Seek Zero realized is just by enforcing these two policies if

59:18

you were to ask a question in English it may actually think in multiple languages so it may think in English for some uh for some parts and then it will switch to Chinese and switch to Spanish it it's mixing and matching a lot of different languages with no regard to following like because language was not like a

59:37

policy that they enforced right so it's kind of like do what you want get to the correct answer um a really interesting intuition here can be learned is that maybe different languages excel in representing different ideas right so the model is understanding like hey I have this idea that I want to express

59:55

but I don't really know how to do it English well but I do know that this concept is expressed very well in let's say Korean so I'll do that in Korean and then figure it out later so that was the behavior of DC0 so So that that that's incredibly interesting i wonder though wouldn't that be a function of you know it where the training data came from you know if

1:00:19

I had a lot of training data on this specific topic in this specific language I might be more likely to reason about it in that language is that a totally you know way to or potential way to think about that yeah totally it could it definitely is a function of what distribution of pre-training data you've

1:00:36

got of these different languages right i think common crawl is a very common data source people go to um it's mostly web pages crawled from the internet but then you may have other data sources um that is specific to that language and then because that language is spoken by those country uh right that country or those countries of people they may have

1:00:57

cultures or they may have focuses that is different from English speaking countries or um Spanish speaking countries etc so what we're realizing is that maybe to get the best performance it is feasible to let the model reason about in different languages right but there is actually a really big drawback of that

1:01:17

and it's the explanability part um machine learning models have been pretty much unexplainable since the since the uh invention of Alexet which is like a image recognition classifier model that was released way back in the day um but after that these models have become a black box and the idea of thinking tokens or test time compute gave us a

1:01:38

hint of how these models reason but if we're doing in different languages or in completely incompre incomprehensible ways then we have no way to reason about or we the human have no way to understand what the model is really doing right so that's why deepseek released when they released their model they released both deepseat R10 and the

1:01:59

regular version of Deep Seek R1 so the regular version of DeepSync R1 you can think of had that extra parameter added on saying like okay you must think in the language that the user asked you to so I'm asking you in Chinese you must think in Chinese if I'm asking you English you must think in English um and consequently in the paper this actually

1:02:20

decreased performance by a little bit but it's not explainable yeah I was going to ask that it seems like okay well then in that case it's more the explanability goes up but obviously it's has to make trade-offs especially if you're asking it in a language where maybe it doesn't quite have as much training data in that language or just for whatever reason it's not as

1:02:39

beneficial to think in that language on that specific topic exactly yeah um I think the paper it was stating that it was a couple of percentage points um in terms of performance hit but because it's so much more explainable and so much more um uh so much better uh easier for a human to understand that's why DeepSync R1 is the model that most people use in production right now

1:03:07

yeah and I think that makes sense the explanability probably matters a lot when you're building applications on top of these kind of things mhm but you know peer performance of course is important depending on the the application yeah exactly uh so I think we can there I don't know if you have more but there's a just a couple quick I don't know if these are comments we can really answer but we can highlight so Pan

1:03:32

I think this was before Andy joined but I would like to build a Figma to code agent with custom design system is that possible yes that's definitely possible i think you could use MRA or many other uh frameworks or tools to build that so now how do you go about doing that that's you know more nuanced but if you come to our discord we can uh maybe

1:03:51

point you in the right direction this one is a little bit more geared towards you Andy uh I'm trying to build an a ML project for face detection for events so everyone can find the people uh background with just a photo but not but he doesn't have a machine learning background so it's been hard professor Andy please help i don't know if we can solve this

1:04:12

but uh maybe point him in the right direction it seems like uh usually for facial detection uh the most common model people use is a convolutional neural network um it's a really simple image classifier um that will just tell you like hey given some input image what category is it now uh depending on what

1:04:31

faces you're trying to predict uh it may drastically increase the challenges like if you are somebody that's uh if you work for a security company or just for some like maybe like user tracking product and you just want to to you have a preset of the people that you want to predict that's fine you can do that pretty easily right uh but if you want

1:04:51

to identify anybody like watchd dogss or like I don't know like some uh intellect thing then that's that's a lot harder right and it's not because you don't have access to the best algorithms is that well you don't have images of everybody's faces right so you can't really reliably um detect that you could

1:05:09

kind of maybe like go into social networks but then data privacy concerns does come up and that's not something I would recommend you doing yeah absolutely great question though thanks for that answer Andy okay so we talked through a number of post-training algorithms or techniques right so we had you're going to have to remind me of all these uh I'm drowning

1:05:34

in acronyms or right now so we have a was it RPO PO it was uh yeah so the first one we talked about was SFT or supervised finetuning the second one we talked about was DPO or direct yeah direct policy optimization and then PO which is proximal policy optimization and then finally GRPO all right so uh some some learning uh

1:06:00

some follow-up learning for me and many many others watching here to continue our our knowledge uh anything else that you wanted to talk through today Andy um sure I can yeah there's actually a lot more development since the since the uh release of GRPO i can kind of uh briefly talk about it so you might wonder like it seems like this is a very straightforward evolution you have SFT

1:06:26

and then you have DPO and then you introduce RL algorithms like PO and GRPO then is there a reason for SFT to exist and and the answer is yes there definitely is uh a very common way that people post train models these days is a combination of SFT plus an R algorithm like DPL or GRPL the reason for that is if you notice um for the reinforcement learning

1:06:51

algorithms we are kind of expecting the model to at least give you an answer that's somewhat correct right like somewhat close to a correct answer like if the model is only operating complete gibberish you're you're done right there's there's nothing you can really optimize off of that right um for GRPO

1:07:10

this is even more so the case because if you're generating five iterations of the outputs if none of them are correct or incorrect Um well actually if none of them are correct and just completely incorrect then which one do you optimize towards right so that's actually one of the inefficiencies of gRPL is that if every

1:07:28

single answer is correct or every single answer is incorrect that that iteration is kind of useless for a training process right because we don't know where to optimize towards you already got everything right or you don't know what you're doing that's why we still need SFT uh to exist in the sense that

1:07:45

we need to use SS F2 to first enforce a certain behavior right like for example if you're asking a model to respond in a certain format and the model is just not doing that well then you still need to do SFT to make sure that the model can at least respond in this format and then we can use GRPO to increase its

1:08:03

reasoning capabilities and its abilities to arrive at the correct answer right or if the model is only up incorrect answers well that's a really good problem to have because the model's doing great but at the same time maybe your policy is is not as difficult or your training data is not as difficult right so how how do you then um how do

1:08:23

you then optimize towards that right so let's say uh in the training data your model is getting 80% of the data correct which means there's only 20% of data that your model can actually use to train on how do you uh how do you actually train on that data correctly uh and how do you find out what data the model needs to be trained on that's the

1:08:43

next iteration of GRPO which is called DAPO right so DAPO is a technique released by the by NC team and is an extension on top of GRPO where we say that okay we realize that JRPO has actually have a lot of problems uh in terms of efficiency so let's optimize it by only sampling data that the model is kind of iffy on that the model may get incorrect results on and then use that

1:09:08

to train the model to ensure that our computation is being effectively used and as a result of the APO we are getting the same performance as if you were to train a model with GRPO in only half the amount of time required so we've drastically compressed again the amount of compute required to get to the

1:09:28

same level of performance okay now I am uh my brain might be about to explode but I feel like I learned a lot so uh I don't know chat how you're feeling hopefully you all are getting some value out of this uh okay are there any So you said like Deepseek uses you was kind of predominant in using GRPO correct and so

1:09:54

the last one was DAPO am I Yes are there any models that have been using that successfully or is it still kind of new where it's experimental what's the state of that quite new yeah um the only model that I I'm pretty sure uses DAPO is one recently released by B by Bite Dance called Seed Coder it's a coding model that does really well on coding

1:10:14

benchmarks like Sweetbench um and then a bunch of other software engineering tasks uh and that's used that's trained using DAPI but it's more like a proof of concept because that model is not is not big uh but if you were to scale it I'm pretty sure pretty sure that's something that like a lot of people are doing right now in the Frontier Labs um

1:10:33

scaling those algorithms to get better models awesome yeah this this whole concept of pre-training post-training I'm sure if you are you know if you classify yourself as a builder like me this stuff's probably new to you so if you're tuning in hopefully you're learning some things i think there's a lot of benefits

1:10:52

to getting more knowledge of kind of the machine learning and some of the you know the background of how all this stuff works because I do think as you're building AI applications or AI agents having some knowledge of the fundamental training process of these models and how these models you know quote unquote

1:11:09

think and act is actually improves your chances of getting better results from the models um all right so any others we we've talked through I've already I've already forgot the al the acronyms again but uh what else do we have is that is that it for today i think I'll just leave uh one note to everyone is that uh

1:11:30

reinforcement learning is a really powerful technique but it would it is not a magical algorithm right it's very deterministic it's very predictable um and reinforcement learning when we see it improving model performance is mostly because it's making the model sample much more uh in a predictable manner what that means is uh if you're if

1:11:55

you're talking to attractive BT you may notice that after maybe if the trap be doesn't get the answer the first time if you keep nudging it if you keep giving it hints it may eventually get to that correct answer and that process is called the model sampling its way to the correct path right so the model is able to do that then we know this model has

1:12:13

the base intrinsic ability to arrive at the correct answer and what reinforcement learning does is it it picks out that path and makes it so the model goes under goes down the correct path every single time it's not magically letting you the model reason better it's just letting the model uh go down the path that arise to the correct

1:12:34

answer uh in a more predictable and more deterministic manner awesome if you are watching this live whether it's on X LinkedIn YouTube if you do have questions for Andy speak now or wait till the next time we bring them on which hopefully will be in soon in the future but let us know if you do have any questions and we'll we'll wait

1:12:59

we'll give people another 30 seconds to a minute to get their questions in but Andy would love to have you come back on and you know continue to educate us builders about ML so we can get better at building AI applications and while we're waiting for any questions that might come in how can people follow you

1:13:16

uh learn more about what you're doing what are the best ways to kind of keep in touch yeah I think the best way is to check out our website which is osmosis.ai um osmosis.ai AI and that should have um the blogs that we're releasing um we're releasing a later one today so if you guys want to check that out and our

1:13:37

social should be also linked there as well so here's the website um and then I will I will actually share uh I will share this as well in on the screen just so people can uh see this so if you wanna you can learn more about what Andy's doing at osmosis.ai so go there check that out i'm I'm sure people would

1:14:12

uh and then also yes on on Twitter as well or X you can follow Andy as well let me put that on there and see if any any chat comments or questions have come in so there you go follow Andy on X learn more about Osmosis reach out to him with some questions or if you need help you know getting better performance from from your AI applications looks like we don't

1:14:44

have any qu any more questions today but been a pleasure to have you on Andy and you know always good to chat since we didn't get a didn't get a chance a couple weeks ago um in person but definitely having me on yeah definitely interested in having you come on again and teaching us more around uh just ML

1:15:02

topics and if we do have things that come up maybe I'll I'll I'll bring a topic next time and ask you questions and just have you teach me maybe for a few minutes okay sounds good all right Andy thanks for thanks for tuning in and thanks for uh teaching us yeah of course anytime see you all right everybody so we are

1:15:27

live brandon thanks for tuning in uh Brandon I think you uh you need to come on the live stream i think you had a a uh hackathon submission we should be talking about right so we should talk we should talk about that and get you on the live stream but we've been doing a bunch of things today so thanks for tuning in

1:15:47

this is AI agents hour but we are more than an hour in at this point we've talked about what's happening in AI we met Paul from the master team and talked through an agent that he was building to help him with his Google Analytics data we got to learn some machine learning uh topics and techniques around mostly

1:16:08

around post- training and kind of fine-tuning and supervised fine-tuning reinforcement learning from Andy honestly my brain's about to explode i am not a machine learning expert obviously if you've been watching for a while you understand that and you probably aren't either but if you are a builder like me and you want to learn

1:16:26

more about this stuff which I think if you're building an AI you should hopefully you got some value out of that and we do have one more segment of the show today before we wrap up and as I mentioned earlier Microsoft's all in on MCP ma's all in on MCP a lot of others are all in on making MCP the protocol for AI and we want to

1:16:51

learn today how you can use Maestro to build an MCP server and ideally get it deployed so we do have uh someone from the MRA team i don't know if he is ready to go or not but we're going to find out pretty soon daniel are you there it looks like you are so I'm going to bring you on hey welcome back how's it going thank

1:17:15

you it's not not your first not first time on the the live stream but this will be the best time yeah yeah and every time I see I see this the production value just keeps going up and up and up yes yeah we get better here a little bit better every day that's the that's the bottom soon you're going to have animations pop

1:17:32

up and all that the only So the only thing I worry about is at some point you know you you have to figure out like what's the weakest link to get a little better at some point that's going to be me and I have to replace myself with AI or someone uh someone more smooth talking and better looking but we'll

1:17:49

we'll for now I'll hold down the fort yeah either that or you'll have to uh take that guitar off the wall and give us a little concert yeah i I have a feeling I like to tell people so I do play guitar um I like to tell people if I'm at a bar I'm good enough where you're not going to get up and leave but

1:18:06

if you're walking by you're not coming in you know so I'm I'm at that level of skill where it's like okay this guy's not bad i'm not going to leave my seat but I'm not walking in to hear this guy play so I I think I just remember one time you told us what your uh go-to song to play was like your your like party trick song and I might be remembering this wrong but I'm like 95% sure that

1:18:31

you said Toxic by Britney Spears that is one of my go-tos yeah I will play a mean acoustic rock version of Toxic by Britney Spears and it ever the crowd-pleaser for sure but uh Daniel what are we going to do today uh you tell me i'm I'm here for the ride okay well I know you and Tyler on the team have been working on just improving our MCP support and so maybe it'd be

1:18:57

cool to talk a little bit about that and then also if we could just you know dive into some code we like to we like to see code around here and can we build an MCP server in you know 20 20 or 30 minutes and actually get something out the door and shipped and maybe have have people that are watching this try to use it in

1:19:16

their you know their tool of choice is that possible can we do that today uh yes that is very possible okay let's do it sometimes I ask questions and I don't know what the answer is going to be and you know sounds like we can do it though so let's do it well we we can't disappoint you so we just always have to say yes uh

1:19:35

yeah well hopefully not hopefully that's not true but uh please please uh give it to me straight but first what's new with MCP in the master world what what are we doing um so one thing that So for a while we had an MC I mean I guess I could just show you uh I'll pull up some I'm actually working on adding resources to the MCP server right

1:20:05

now uh uh so I guess like our our long-term plan is to uh like what I'm working on right now is to just make sure that our MCP server and MCP client both support the full MCP spec so it's uh it's pretty rare uh like if you if you look through a lot of the the clients that are uh that have MCP support um I think there's only one that supports the full spec and it's a Python

1:20:41

um library and so the majority of MCP clients uh right now just because it's so new and uh a lot of people aren't really um well people are starting to ask for more of the spec but like uh so people are starting to slowly roll that out um but the majority that they just support is like tools the tools is kind of like the the big kahuna there it's like if

1:21:06

you don't have tools you don't have like like what's the point and so everything else is just kind of adding on yeah and I think a lot of people like at least in most people when they think of just MCP servers they just think a collection of tools that's that's all that's what MCP is to most people so if you if that's what you think MCP is well today you're kind of right but the spec is a lot more

1:21:26

than that although there are some things in the spec that I'll be honest I don't fully understand like what the heck are resources i don't know what are resources prompts i know what prompts are there's tools obviously MCP servers have tools do you know what resources are does anyone know what are resources i mean that's what I'm I'm adding right now so I could tell you if you really

1:21:46

want yeah tell me honestly I have no idea what I know again I have not read the full spec probably should but but like many of you I'm busy i I I do not read every white paper i try to read some of them i'm just trying to I mean that's why we're here yeah i'm just trying to hang on like all of you watching like this stuff's changing and

1:22:04

you can't keep up with all of it but I know the spec has prompts it has resources but I don't know what resources are obviously I know what prompts are but te tell me a little bit about what what are some of the things that we're adding from the spec and maybe you can tell us a little bit about what what those things are uh so so resources is a way to dynamically fetch

1:22:27

some kind of like like database entries like text files some some kind of like data that gives extra context to to uh the the LLM so just like adding adding it into context um and so it's something that so let's say you have like uh a bunch of different users and you want to grab some like information about a specific

1:22:57

user and so you fetch that information that information can change over time as well so there's things built into the spec to allow uh those to like dynamically change and for the server to say like hey this this resource has updated uh you should fetch the the updated resource so the client will get like the updated resource and so it's just a way to like manage uh this like

1:23:26

dynamic context that will get passed into uh these LLM calls okay cool anything else on the spec that we're adding uh yeah we're going to add uh prompts as well um how does Do do you have an idea of how that works is that just like stored prompts that work well from the or that the MCP server has saved that

1:23:53

you can pass on from the MCP server to your LLM call or uh Yeah so so a lot of this stuff I find so my my big qual with like all this this MCP stuff is that everything is label like all the the names for things like uh all the additional things for the spec other than tools like resources prompts sampling roots they're all named kind of poorly so it's not

1:24:24

very intuitive as to like uh what they do so you would think like um uh prompts is like a way to like prompt the LLM um but but really it's it's a way to basically Yeah like like suggest like use these prompts um for this MCP server like this is like a list of prompts to use for the MCP server but it doesn't actually like help with like executing the prompts but there's another uh thing that we'll also

1:25:00

be adding called sampling and so sampling is a way for the MCP server to have access to the LLM to make requests so without so the MCP server isn't going to have any like API keys it's not going to have any any like privilege to make uh these calls to the LLM but what sampling allows is basically it sends a

1:25:26

response back to the the agent and just says like "Hey can you make this LLM call?" And so the agent will make that call return the response to the the tool so the tool basically has like a like a remote way of calling the the LLM okay well first of all why is it called sampling that doesn't make any sense to

1:25:46

me that's a terrible name come on anthropic um but maybe I don't fully understand it on the surface seems like a bad name but naming is hard so I'll give you I'll give you the pass uh okay so it's more of a suggestion saying like the MCP server returns this and says run this and then give me the result and

1:26:04

then it passes that result into a tool the tool then returns that response back to the agent something like that yeah exactly interesting all right well you said one other thing root what is root oh roots roots like plants plant roots yeah like plant roots okay um is my Canadian accent coming through is that is that what maybe um yeah

1:26:31

so roots is uh a way to define uh like what context the MCP server has so it would be like uh like a like a directory like a file system directory but it could also be like a like a a URL and then so it would be if you make it a directory it's basically saying like the root of everything that you should know

1:27:00

is in this directory or like the root of everything you should know is like in this like URL and then all of it's like okay so it's like top the top level root of yeah of the the project or the the directory or the URL yeah yeah exactly cool all right well we talked through the spec i learned some more things about the the MCP spec today yeah don't

1:27:24

worry i learned these things recently too yeah i mean we're all we're all learning these things so what uh but what can we build let's let's talk through I'd love to build an MCP server that's some kind of fun example of something you know so ultimately you know we're just going to build an MCP server that has some master

1:27:42

tools and get it deployed i guess chat if you have anything you know any ideas please let us know otherwise Daniel did you have any ideas or otherwise we can we can brainstorm live right here on this live stream oh I I came idealist okay well then chat we're going to need help because I'm not you know I I'm an hour and a half in my creativity juices are already limited and so they're

1:28:05

they're not saying we have a lot here uh of course we can start with just the basic of just getting the MCP server up and pass in the simple weather tool i think that's a good starting point and then let's maybe see if there's some other uh examples that we can we can kind of pass in cool did you want to to drive this and I'll I'll be your guardian angel

1:28:33

i wasn't planning on it i was not planning on it but you know what why not it might be good to for me to see uh somebody building this too that I can if there's any paper cuts i'm sure there will be things don't make sense all right chat we're doing this you get to see me struggle yeah this is uh AI agents hour or uh putting Shane on the spot hour hey I'm here for it

1:28:59

uh let me uh you know clean up all the bad code I was writing so no one can see that and I will uh share my screen and let's see what we can build yeah we basically just want to show if Shane can do this you can do this this is actually you know a fun that's our litmus test a fun thought experiment because if I can do it I have a good feeling that you are going to be able to do it as well so all

1:29:27

right we will uh we will do it we'll do it live before we do if you are just joining us welcome to AI Agents Hour we're an hour and a half in we are about to build an MCP server so if you want to build an MCP server then uh follow along you can code alongside me we're going to see if we can if Daniel can talk me

1:29:51

through it he's he's from the master team he's built a lot of our MCP functionality so he's going to help me out but if you have wanted to build an MCP server now is your chance um we do have a suggestion uh you know maybe we can if we get if if we get past the basics but a tool to get the latest Monster release notes or recent live stream links oh I like that last

1:30:15

one we I want to build a tool that can just return links to the last x number of live streams so we could probably scrape that from YouTube or you know like there's got to be a way right we got to get that that data exists somewhere that seems like a fun example i don't know how we do it but might be able to pull it from the MRA YouTube page or

1:30:38

something so that that's the that's the second tool we're going to add and then that way people can add this add the you know just ask what's the latest live stream link and they can get a link to the live stream on YouTube okay I like that that's a good idea sweet uh let's do it but we got to get past the first part which is just me actually you know

1:31:01

getting this thing working all right so let me know how this looks i'm going to probably need to zoom a bunch uh I think it looks good okay all right let's do it okay here we go we're going to do npm create master at latest right looks right let's run it we're going to call this my MCP let's or let's let's call this uh live stream MCP

1:31:52

already I'm thinking of something that I need to add i don't think in this uh create command it scaffolds an MCP server i'm not sure if that's an option no no i don't think it is but that's okay we don't need it i I can copy from from docs i'm not above copying from docs if the docs are good you know i don't know i think they're good well we're

1:32:18

gonna test them if not we we're going to talk to Paul so Paul if you're still watching we're going to we might have some docs to fix we're going to find out all right so do we do we need this stuff no um I mean we can let's do an agent because uh one thing we added uh recently is that you can pass an MCP server directly into Oh no that's that the Mustra instance never mind yeah I

1:32:46

guess we don't need agents all right well we're going to just we'll use I mean we don't really even need a lot of this stuff right so we'll Yeah I'm just going to skip this stuff sure i'm going to make I'm going to make cursor a MRA expert we're going to test if Cursor can write this MCP server for me because you know why not

1:33:04

why don't we just vibe code this thing all right so I'm going to go into here i do have to just in case we do use an agent i am going to just copy in my environment file all right now let's do open it up in cursor could have used uh wind surf but I'm deciding cursor's cursor is my jam today all right are you like 5050 on both of

1:33:42

them so I use uh I try to play around with both so I know like I have opinions on which one's better i basically if I'm ever building a master project and I have a front end and a back end I will open up my front end in Windsurf because it has the preview stuff and you can like tell it to fix frontend things a

1:33:59

little better and I'll open up my backend in cursor because cursor I just like that I have project level support for the MRA MCP doc server not that I could install it globally too it'd be fine but that's how I typically split up just so I can test both i don't know interesting okay no rhyme or reason why

1:34:18

i just like to play around with both and see um I'm not going to do an update we'll do that we'll save that for later all right so do I have any tools is there any tools actually in here no all right so I want to create a tool this there's no way this is going to work but I just want to see uh for for what it's worth I always set the model

1:34:58

to to Gemini Pro i find that one works the the best rather than auto uh like right beside Yeah like if you deselect then like Gemini 2.5 Pro Max all right we'll try it that's the good create me a master tool hey hey ob's not here we We were We were went almost the whole the whole stream without a swear word dropped i got to

1:35:27

make the disclaimer this is not necessarily a family-friendly stream but it's more family friendly when Oby's not here yeah this ain't your family stream create me a master tool that goes out to the master AI YouTube live page and returns a list of YouTube links for the last 10 live streams the live stream live stream videos are listed at

1:36:01

I have no idea this is going to work but why not why don't we just try it youtube yeah I've never actually used the uh the YouTube API so yeah well I mean I think we hopefully we don't have to use an API and we can just like fetch from the page but I don't know we'll see i don't know if that's going to work oh I see yeah like fetch that yeah just do a I'd like to just do a fetch and just see if it can return the last x number i don't but

1:36:37

I doubt that's going to work we'll see might we might have to use the YouTube API but you know we'll figure that out put the tool in the tools folder in a file called sure if this doesn't work I'm gonna blame Gemini and say it would work if we went with Claude i mean open up Windfur try Claude the tools directory does exist so it already

1:37:11

messed up there i don't know this is I think it's making stuff up uh yes because it should be create tool from it is making things up so we need to run this hopefully it'll run the master docs fix it fix it it might be able to work through this yeah we'll give it a chance look at that it's getting closer it's getting better we don't have puppeteer which makes

1:37:42

sense uh we can try to add puppeteer and see yeah let's run it let's do it so puppeteer will make it so rather than just fetching from the URL which I guess you know I don't know if if the links are actually available in the HTML or if we need to use Puppeteer to actually get it to load the Puppeteer types are included okay let's run that let's just let this

1:38:18

thing go all right so creates a tool called live stream tool fetch the last 10 live stream YouTube links from the Monsterra AI channel launches a puppeteer instance goes to this page waits for the selector video title link returns 10 elements i mean seems like it should work i mean the code the code looks good to me but I I have a gut

1:38:48

feeling we we we have a thing you know for for those of you just watching if it if it works the first time everyone takes a drink of whatever is sitting next to them doesn't matter if it's water whatever because it almost never does it's it's it's a drink worthy occasion if it works the first time okay well you know what i'm I'm starting to unscrew the lid of my water bottle i'm

1:39:08

getting ready okay well I'm going to move it because it obviously put it in the wrong spot that's fine i wasn't very specific i should have told it where I wanted it to be and let's see i don't know if this is going to if this tool is going to show up it's not and I think I know why but just uh I'm just going to pass it to We don't have an agent

1:39:47

uh I'm just going to call this test agent create me an example test MRA agent in the source MRA agent folder in the test.ts file the agent should have instructions to call the use the master docks please you know what this still counts as the the first try just because this is something that I think is on our end

1:40:35

that the tool isn't there yeah I think we had a we had a regression in the playground that just pro it might be fixed with this latest release that I don't know if it I don't know if the release has gone out yet but it likely will have been fixed it should you should see any tools not just tools that are assigned to agents but right now

1:40:55

there's seems to be a little regression okay let's run the master docs you can see it's going to do some stuff test agent passes in the tool accept this looks good all right using open AI which is fine i have an API key for that uh now I'll register this new agent yep we need to do that so we're good what is this sending a post request

1:41:26

okay sure i mean we could have tested this I guess from the API as well right mhm but easier to test it from the from here we have our agent we have our tool we have our live stream tool all right what are the odds this is going to work all right we're doing this if if if there are you know chat if you're listening if there are 10 YouTube links that show up here we all take a

1:41:56

drink drum roll no error yet okay hold on damn okay wait wait go into one of them let's see if it's for real right in the middle i really hope it's just like a Rick roll that would be That would be hilarious hot damn oh drink that's pretty okay here we go cheers cheers everybody all right vibe coding is great when it works look at that all

1:42:30

right so now we want to build an MCP server so you can hook this up to your claw desktop or cursor and if you're ever bored you just say "Well ideally here's what I'd like to do this MCP server I think would be better if it showed like if there was a parameter if it was an active stream so it would show that there's an active um you know like

1:42:54

we're live right now so I'd want it to show me that hey this link this is the link to the live thing that's happening right now and then also the past links with maybe ideally the date if it told me the date of that stream yes lots of So we got to take a you know all cheers folks agreed you also owe an apology to to Gemini yeah I do drinks up

1:43:19

i I might be a Gemini convert after this you just got to tell it to use the master docs MCP server and then all is well and and by the way Aban I'm going to share this again this was your idea you get the credit for this this great experience we're going through right now so thank you for the suggestion okay let's keep going we So

1:43:42

can we can we do that quick let's see if we can update this so it returns the active uh if there's a live stream going on now I guess we should probably look at this page i mean there is maybe a way to tell that it's live and then also the the last 10 that are Was the first link the live one well let's look because we

1:44:05

might already have it yeah we just Yeah I guess we could just But we need to know there has to be some indication that it is live i'm not I'm not I'm not live 247 you know oh there we go this is really meta too did you hear that it's like it's like let's see let's see how much of an echo we can get how how delayed is this live stream but it works okay this is

1:44:28

cool all right we are almost at two hours and now I'm just like we just got to we got to finish this thing so we're gonna we're gonna keep going okay let's go and let's um do this and let's see if Gemini can help me i want to return an object that because it's an object that has two properties active stream past streams

1:45:10

the active stream should be a link to the live to a live stream if it is live otherwise it should be empty null empty should be empty sure the past streams should include the link for for the last 10 live streams not including any not including the active live you should include the uh like the title or

1:45:57

something in there too yes in we should include the title and the link to the video in the results all right Gemini let's do this we We've never on this stream ever had two drink worthy occasions so if it gets this right I must know that I'm uh pretty dehydrated you need you need some water okay so gave me a type a title and a link i like

1:46:48

that the gives me an output okay an array i like that i'm just going to accept everything we're going to go for it how does it Let's see how it actually tells us something's live just because I like to know what the code's doing a little bit active this is Okay so it's looking for that live badge in the marked in the

1:47:18

actual like HTML which is cool that's that's how I would do it simple we don't get the active stream all right little more comments than I would write in my code i'll you know be honest but you know we'll we'll run with it let's see what it does okay moment of truth all right stay stay thirsty my friends let's see

1:47:50

okay well it failed it's close it is close but this one should be showing up in the active stream so there's the detection of which if whether it's active or not is not working let's try to figure out why how there's got to be how do I get to this tag thumbnail overlay time yikes this is some gnarly markup

1:48:37

which is fun and it's an SVG okay oh it's not able to find let's just see what it's trying to do i suppose I could just ask ask Gemini as well yt badge supported renderer let's just uh let's just see see if Gemini can solve this i'm going to copy the HTML and say the active stream is not correct i like to say please to my to my LLMs

1:50:05

you really never know you gota Yeah you got to stay cordial yeah and Abanet yes that's good call that's that's what I That was my thought too let's try that and Brandon title and date yes i think we should add the date eventually which we could probably just extract either from the title or maybe there's some uh

1:50:26

some metadata there we can grab all right this is kind of hard to read because it's zoomed very far let's just see you're right so using a different selector should be more accurate well I just need it to be accurate it doesn't have to be more accurate it just has to work we'll be the judge of that yeah i mean just you just have to work i

1:51:10

mean it's either this is this is a boolean either works or it doesn't okay let's try this again and all right well done gemini coming in clutch let's just make sure that it's actually working correctly yes that's pretty sweet that's a pretty good moment click that uh it either works or doesn't all right let's

1:51:48

make sure this is May 19th which it should be it is all right now is there Let's look on this page whoops and is there Does it tell me the date just tells me when it was streamed so I mean I tells I have the date in the title I guess for most of these not all of them though so I wonder if there's a way to tell

1:52:18

date not really probably not we'll leave that for now it's It's good enough for now we got to we have more things to do we got to get this thing into an MCP server but yeah we got bigger fish to fry ideally we would get the date without having to use the YouTube API we just want to scrape this thing looks good all right now I suppose we need we

1:52:40

need to create this into an MCP server yes the thing we're here for let's do it all this was just to lead up to this moment like to give it some encouragement great work now we need to get this to Should I be concerned that you talk to Gemini the same way that you talk to us hey you know I I like to manage my my

1:53:15

LLMs the same way I try to try to manage my uh friendships please create me a new MCP server with this live stream tool and export it from the ma from the main master class i feel like I have some advantages because I know kind of what I need i can tell it what I I know what I need it to do for the most part where some people if you didn't have this you'd have to

1:53:45

rely on the docs but let's see if it works although I will say I've never uh created this in MCP server before so you know I kind of know how first time for everything i I know how it's supposed to work you know but I have not done it okay need to add MCP that's cool i agree do we need to run this yes run it let's do it i also don't know like how's it getting the version numbers is it Is

1:54:16

that right is that the right version uh that's a good question let's see brandon says "This is a touring test to make sure Daniel isn't a robot." Uh I wonder how I'm doing but I don't think it's doing that good okay mcp server live stream tool creates new MCP server okay let's let's test this thing okay so this is created so the latest is

1:54:52

0.5 so it's not even close all right well it it might be like somewhere in our docs that it found that version or something yeah 0.5.0 let's do it and then let's let's just accept everything because that's

1:55:10

what we like to do mpm install we'll get these updated this is why I wish you know cursor should just run like mpm install atmstrampmcp then you get the latest right instead of just trying to like add the package with a specific version number that that seems like a poor choice i wonder I've had it do the

1:55:35

opposite for me like I've never had it added into the package JSON i had it just like say like run npm install like mastramcp so I wonder why it varies i've seen it both ways okay so this is it didn't really do exactly what I asked it though don't we need to pass in to the master instance isn't there like a Yeah mcp although

1:55:58

that is very new so I'm not sure if we actually have it in the documentation yet is this is this a object and obviously this has to be above it and I don't think I need to export this MCP server right um no something like that and I don't I don't need to do this right because No it's all handled for you

1:56:28

this is it right let's see what happens all right well I should be able to run mpm rundev and I do realize uh we're gonna we're going to test this and I'm I forgot I actually have another meeting that I have to go to so I might have to we might have to close this out here in a in a minute yeah you probably want to make sure

1:56:49

you're not going to uh live stream that meeting well hey you never sometimes maybe could be could be a fun meeting all right so if I open up the MRA playground I don't see Do we have it i thought we had MCP servers over here but I don't see those is that only in like alpha oh it might be in alpha yeah okay so I need to

1:57:15

install is at is it I guess if I look here would this be core uh I think it's MRA or MRA because it's I think that's where the playground lives but I need I pretty much need to do if I do alpha on one I should be right yeah I think so yeah basically to make sure the packages are all aligned and then I'm tags may not have uh valid tag name

1:58:02

let's just try like this invalid tag let me find okay so this here I'll paste it's at 0.91-pha.6 i thought though if I just did if I installed 0.9 Oh wait here

1:58:23

let's Oh cuz the tag should be add yeah I should be able to just get the Let's go back and let's try um I think if you just do alpha in these instead of any versions it should work you mean if I just put alpha here yeah because then it'll fetch the tag all right did that work i don't know that was suspiciously fast yeah I don't think

1:59:12

it worked but we're going to try it and see nope okay well here's what we're going to do i'm going to get this working we're going to show it tomorrow on the live stream and as a you know a way to wrap this up if you want to see this if you want to test out this MCP server go on to uh go on to X follow me and I will tweet this out later today hopefully after this next

1:59:48

meeting I will get this deployed and I will tell you how to do it so we you can all get the live streams directly in your claw desktop or cursor or whatever because I know that's that's what the people want i think you got to give them what they want i'm I'm mostly making that up but um thank you for everyone

2:00:07

tuning in this has been AI agents two hours because that's how long we've gone today thanks Daniel for helping me build an MCP server we have it we do have it running locally we just don't have the playground that shows it so we will get that updated get those packages updated and we'll maybe uh show it again tomorrow of what we actually built so if

2:00:30

you tune in tomorrow we'll we'll give a little quick summary of where we left off so everyone can see it and I'll also like I said I'll tweet it out uh today we talked about what's happening in AI we met with Paul and got to see his Google Analytics agent that only you know a Devril or marketer would care about or want to build but we built it

2:00:48

and it worked we also learned some machine learning or ML from Andy from Osmosis so thanks Andy Professor Andy as I like to refer to him as uh for teaching us some things and making my brain almost explode but I feel slightly smarter and thanks Daniel for coming on the stream and teaching us how to build

2:01:07

an MCP server thanks for having me all right everybody please make sure you're following us on YouTube on X on LinkedIn and we will see you next time goodbye

More episodes