Back to all workshops

Build Production-Ready RAG Applications with Mastra

March 13, 2025•4:00 PM UTC•60 minutes

Retrieval Augmented Generation (RAG) has become essential for building AI applications that can effectively leverage your data. In this hands-on workshop, you'll learn how to build sophisticated RAG systems that go beyond basic implementations.

Learn the proven techniques that make RAG systems reliable and effective, including optimal chunking strategies, modern retrieval methods, and hybrid search approaches that combine the best of semantic and keyword search. Join Mastra.ai to understand the architectural decisions that can make or break your RAG application's performance.

You'll get hands-on experience implementing:

Evaluating when RAG is needed vs. when context windows suffice
Advanced document chunking strategies for different content types
Sophisticated semantic search and embedding techniques
Hybrid retrieval systems combining keyword and semantic search
Re-ranking strategies to improve result relevance
Performance optimization techniques for production environments

This workshop is ideal for developers and AI engineers who want to build more sophisticated RAG applications. Participants should have familiarity with JavaScript and understand fundamental RAG concepts. Come prepared with a code editor—you'll leave with working implementations of advanced RAG components that you can integrate into your projects.

Move beyond basic RAG implementations and learn how to build systems that can handle real-world complexity. Join us for this practical session where you'll implement production-ready RAG solutions using proven architectural patterns.

Workshop Transcript

0:28

e e e e e e e hey everybody we'll get started in a few minutes here e oh all right yeah I guess uh Sean thanks for kickstarting us yeah if we we're still we'll give everyone just a few more minutes we're probably going to have a lot of people coming in you know just a few minutes late that's pretty normal but if you want to uh drop in the chat where you're calling in from that

5:16

would be that'd be great I am currently in San Francisco California but I imagine we're going to have people kind of from all over the place and we also have Nick from our team who's going to be joining us he's kind of the the MRA resident rag expert so he'll be helping show a bunch of the code samples today I'll be talking a little bit about rag he'll jump in and

5:40

uh show us some stuff and we'll have plenty of time for questions along the way as well we have some Atlanta Dallas Poland Oregon Kansas City Jamaica Ukraine I think Nick you're you're out near Austin right Nick yep in the a yeah in the Austin area so Daniel yes we will uh I'll send out the link to the GitHub

6:29

once I once I hand it over to Nick I'll send out the link to GitHub everyone will get the copy of this uh recording and you'll get the link to the repo you'll get the link to the slides all that stuff will come in an email usually a few hours after the the event but I'll also drop it in the chat here in just a little bit we'll give everyone maybe one more

6:50

minute just for uh you know some of the late comers to get in and then we'll go ahead and just get started e all right right welcome everybody let's get us kicked off here I will share my screen it's great to see you all and for all those that are watching the recording you know thanks for watching the recording we'll get it started here

8:19

give me one second and yeah let's go here all right hopefully you all can see my screen let me know uh let me know if not I will assume everyone can but I have been known to share the wrong screen from time to time so that has happened all right uh welcome everybody so let's talk about what the goal for today is we're going to learn how to build rag applications with MRA so we're

8:55

going to go through and talk about a lot of the basics of rag uh yeah okay one second all right yeah we're going to go through and talk about a lot of the basics of rag starting at the beginning we're going to spend some time talking through some of the use cases for it I'll show some slides and then we'll actually dig in and you'll get to see some code of how it all

9:21

actually works within MRA you know whether using with MRA or not you should learn quite a bit just about retrieval augmented generation in general you know we're not you know the original title of this was you know productionizing rag applications that's really hard to teach in one hour so I don't think we're going to get fully

9:39

that far but hopefully this will be a really good Kickstart to your journey no matter if you're just starting with rag or if you been spending a little bit of time with it you should be able to get quite a bit from this so some of the things that we're going to learn we're going to talk about some core rag or common rag Concepts

9:56

like uh chunking embedding models retrieval ranking we'll talk about fine tuning and a little bit about synthetic data and then we'll show rag in MRA so who are we so I Am Shane Thomas I'm founder and chief product officer at MRA and formerly I was in engine product at Gatsby NFI I built a site called audio feed and I've been doing open source stuff for 15 years way back when

10:23

uh Drupal was a thing I spent quite a bit of time in the Drupal Community back in the day and reach out to me on uh X or Twitter whatever you call it these days and Linkedin as well and we also have Nick here who's going to jump in and kind of show us some code he's a founding engineer at MRA he was you know previously at netfi and he's excited

10:43

about rag as I think a lot of us are if we're here so you can reach out and connect with him as well but let's talk about just high level what exactly is rag so there's kind of two tracks on this diagram and I just pulled this right from the MRA home page because I think it's a pretty good highle diagram of what rag actually is so if you look at the bottom part you

11:06

you have some data some knowledge base and you go through this process of putting it through an embedding model you kind of store it in a vector database and that's the whole bottom section right the whole bottom pipeline then you have this more real-time pipeline where someone actually has a query you know some kind of prompt maybe it goes through an embedding model you do some retrieval which uses your vector

11:31

store you pass that data to an llm and you get output and we'll talk a little bit more about this but that's the general process of how rag specifically you know using embeddings and Vector databases actually works and one kind of I will say one kind of pet peeve about rag that I I do have is you know people think of when they hear rag they think it's only for

11:56

Vector databases that's the most common uh perception of it but it doesn't have to just be with Vector database if you use a more traditional postgress database and you do something where you do some kind of retrieval and you pass that information into the context window of an llm you're still kind of doing rag

12:15

that that's still rag but it um it doesn't just have to be with Vector databases but in this case we're going to be spending a lot of time on that today and that's the most common use case that's what most people think of so what is rag it's kind of what I just uh described it's a way for you to get information and pass it into a model so the model can give you better

12:38

results one of the problems is if you might have external documents that the large language models are not trained on and you need to get that information from wherever that that documentation is or wherever that data information lives into the context window of an llm and this also really helps us get around you know context window limitations context windows are only so big there are some

13:03

really large models like Gemini and other things that other models that allow you to really flood the context window with a lot of information but sometimes that does even if you can get it all in the context window it doesn't always give you the best accuracy and so being able to have a system where you

13:20

get just the right results and you can then control how those get passed into the context window can certainly improve um accuracy even latency on some some of these things so that's one of the reasons that rag can is still uh really relevant and really important today so let's talk a little bit about chunking chunking is the process of you

13:40

take you have these big documents you have all this data maybe it's a whole bunch of web pages maybe it's a whole bunch of you know an internal knowledge base whatever it might be but you got to take this information this content and you have to break it down into chunks and then those chunks eventually be turned into embeddings and those embeddings to get stored in the actual

14:01

Vector database but when you do retrieval you're actually operating on the chunk level you're doing some kind of query that actually just Returns the chunk so it's really important to think through uh your chunking strategy and how you want to break up the documents whether it's you know by sentences by

14:18

paragraphs something that's semantically meaningful so you do kind of have this balance you know if you have too large of chunks you may have unrelated information if you're too small you might not have as much context as needed so you're trying to find that that right balance and this is why it's becomes kind of a very uh challenging optimization game at

14:44

higher levels when you're doing rag where you actually have to test different chunking strategies we'll talk about different embedding models but there's a process where you'll have to spend some time going through this you do end up storing additional metadata with each chunk which kind of get allows you to help with what uh semantically

15:02

similar chunks might be you can overlap chunks or retrieve neighboring chunks there's a bunch of different strategies for when you actually do the retrieval part to help make sure that you're getting enough information to actually pass that into the model when you think about embedding model so this is kind of the second step

15:18

right first we do some chunking then we take those chunks and we have to actually do the embedding part all these are just specialized ml models that take your text and turn it into numerical vectors so just numbers that represent semantic meaning right this so what this allows you to do it kind of allows you to do this semantic rece retrieval or semantic search so if

15:44

we were doing keyword-based search I might be searching for the word puppy and I would expect naturally that the word dog is similar to the word puppy but those aren't the same words right so if you doing keyword-based search it wouldn't return any documents that just use the word dog but if you use embeddings those uh

16:08

puppy and dog become semantically similar and so you could retrieve uh chunks that maybe contain the word dog in it and so this is what uh semantic search with and and why rag allow kind of unlocks that and allows that to uh make your results better when you are passing that information to llms and as I mentioned there's exact matching that's

16:34

keyword-based retrieval this is how we've been doing search for a very very long time right then there's semantic search which is you're using kind of this embedding based retrieval and then sometimes you might want to do both there's kind of a a Best of Both Worlds approach where you can do keyword plus semantic search and maybe you pass a

16:51

little bit of both into the context window of the llm and all of these are uh retrieval techniques that you might use in a production application so for reranking often after you retrieve relevant chunks you even with semantic search you might not get the actual best results first so you might go through some kind of reranking model which is a specialized model it's a little more you

17:19

know computationally expensive so there's increased cost and latency if you add this reranking step but you might do a process where you retrieve the 10 most semantically similar results or chunks but out of that actually chunk five or six maybe is even better for your query so by running it through this extra process it'll kind of rerank those chunks and then pass the best chunks in

17:47

to your uh you know into the llm to actually get the output so it's important to uh kind of think through R reranking it's actually one of the one of the first things we see people do when they actually want to productionize and they want to get better quality results is to First increase the amount of chunks you get which has some drawbacks but you increase the chunks and then you can also add this kind of reranking Step so it's one of the first

18:12

things we typically see people do when they're trying to improve the accuracy of their rag pipelines now the concept of fine-tuning is you know kind of hotly debated specifically targeting embedding and reranking model fine-tuning when you're thinking about Rag and it can be important when you get into production applications we're not

18:35

going to talk about it too much today but when you get to a very high level of scale and you want to you really care about the accuracy you may want to then consider taking the data that you do have and improving your embedding model or your reranking model through some kind of fine tuning process and the reason that could be useful is you may you know you may have a

19:01

specialized use case that the the embedding model is not trained on if you are trying to do some kind of law search and you're you're looking at like actual legal documents or you actual like laws that are passed by the government maybe the embedding models weren't trained specifically for that task so by doing

19:21

some kind of fine-tuning you now can more accurately create embeddings for that specific type of data so if you have some kind of specialized that's not a general approach this can you can really see some good results here but you do need a decent amount of data I've heard from different people it does

19:38

depend on the use case I've talked to someone that said with 6,000 uh basically data set uh pairs they could increase the accuracy by quite a few percentage points and then I've seen others that said it took them a 100,000 uh pieces of data before they really started to see significant results that made it worth it so it does depend a little bit on the use case and synthetic data can help

20:03

here where if you have some data you can use the llms to generate other similar types of data that can then be used in the fine-tuning process and with that let's talk a little bit about MRA and then I'll hand it over to Nick to get us started and having us actually look at some code so MRA is an open source AI agent framework

20:22

for typescript it comes with agents that have tools memory tracing has State machine based workflows that allow you to do human in the loop we have evals for you know tracking and measuring AI output we have storage for rag pipelines which is what we're going to talk about today and we have all this in a local

20:42

development playground with MRA we our goal is to be opinionated so but flexible it allows you to get further faster but we don't want to lock you in and in the case of something like rag this might mean you use Master out of the box by default you can just start using it and you have a vector database right there but maybe

21:02

when you productionize you want to decide to use pine cone or some other Vector DB that's fine too you can easily just swap out those things so we do have the opinions that get you there faster but you are flexible enough where you're not locked into our decisions uh really we we like to think of it as like a batter is included you know it's it all

21:20

is out of the box but you can of course make it suit your needs so I'm going to hand it over to Nick and we're going to do a little bit with rag we're GNA talk about chunking embedding we'll talk about retrieval reranking and then we'll talk about agent tool usage with Rag and what that means so Nick I'm gonna toss

21:39

it over to you cool thanks J um yeah let me share my screen and uh second cool all right perfect so I want to talk to you guys but I have a bunch of examples of basically each um step for rag and just want to go through each of them with you so we're going to start with chunking and then I'll show you different types of chunking like character chunking there's recursive

22:07

chunking there's Json chunking and markdown chunking then show you um like embedding so there's different ways you can embed there's different embedding models you can use so we can compar we can see what happens when you do embed and how how to embed um after that we'll I'll show you Vector upserting so just like um after you embed you can upsert

22:31

that data into your vector DB after that I was going to show you guys uh Vector search and then Vector re ranking um so Vector then basically quaring the the vector database and also showing that Vector we ranking and then we'll tie it off with some uh examples of using this with agent generation and that's going to be exciting to show you

22:57

guys um and Nick can you make your code Just a Touch bigger and maybe just like hide the sidebar or something so we can that way it's just hopefully you all can see it but maybe it make it just a little bigger so everyone can see it well that uh yeah you can probably that works for me if if it doesn't for anybody drop it in the chat and I'll let

23:17

Nick know but yeah that looks good okay cool um so basically starting off with the character chunking close all these open basically uh character chunking is pretty much the most basic version of chunking and basically how MRA does it is we'll create something called an M document the M document is basically basically taking whatever

23:41

document you want to U chunk storing in the M document and then you can use different chunking strategies to get the um exact chunks you want so for this example a character chunking strategy would just be looking at the document and specifically splitting on characters so you would have a so for the size of 20 it would look at this text which is

24:06

this is a simple text document that will be split into chunks based on character count and we'll the chunking strategy is basically very simple it'll take it'll take 20 lines or 20 characters and then that's the chunk size and it'll split based on that and to show you guys that in action let me do that right now take

24:27

this one I'll make this one bigger too let me do that okay so if you look at here um when after we uh this better okay after we split the chunks you'll see that each chunk has its own ID and it contains this various information about the uh the chunk itself so we have the text we have a metadata separator which is basically explains like how chunks would

24:56

usually be separated um but the most important part is just really the text so you can see like it splits on the 20 characters where it's like this is a simple text then it goes to the text document that will be split into chunk based on character count and then you could see I I I logged the exact um uh chunks that will the chunk text that will be shown afterwards and yeah so this is a very

25:23

basic version of that now if we go to the next next example it's a little little more complic it's a little more complicated in terms of uh so this is cursive chunking strategy and this one is a little different where instead of just looking at character length whe recurse into um will recurse into the the

25:46

content itself and make sure and it it'll try to keep um content as uh semantically close as possible it'll try to maintain as much as possible use of uh of the uh text content or text context and we'll uh you know split on separators and make sure that the the chunks that themselves are um are are have the context they need so I do that

26:17

one I can do this like chunking so when we split on this uh code content you'll see that it tries to keep as much of the context as possible so it'll print out the the function definition it'll print out some of the the comments and it'll start printing out the um some of the code up to the size limit and then it will go to the next one but it'll also keep in mind these separators so that things make

26:53

sense within uh the context so like for example this last one it'll keep that whole function together so as a as a one chunk um open this up here okay um so the next thing I want to show is the Json chunking so basically Masha right now we have different chunking strategies based on your document so we support Json formats as

27:24

well as markdown formats and HTML formats and various other formats like that so right now I'm going to show you the Json format so we're going to take this Json content oops take this Json content and um split it and the Json content is interesting or Json chunking is interesting because it will not only split um the uh into the proper chunks

27:53

but it also maintain the nesting format so you'll see here we have um this this content will be split appropriately to maintain its uh the nested Fields so you can clearly see that it goes config API endpoint and then you get all this data while the other one would be config features and and maintains this data that way uh when

28:17

you do split on Json chunks you'll still know exactly the the treat the the order of uh the content and how it was nested so it maintain that uh that structure okay then going on to the mark down chunking um so this is pretty similar but uh it uh pit takes the markdown content and we have for specifically for markdown we have headers that it'll make sure to split on to keep certain pieces

28:54

of data together so I can show you guys that right now um and then uh so like right now we have these markdown chunks and it's going to split on the headers it will split on is like a the pound sign header one two pound sides headed two three pound sides headed three so uh as you'll see like it'll split it

29:21

not it sees this first line and it'll split on that as it's one of the headers it wants to keep focus on and then if you go to the next one it will um and it'll save that in its metadata um so for this next chunk it'll take this part and make this whole section make this whole part of section A different chunk um and then if we keep going down

29:48

you'll see that basically it's maintaining depending on the headers you use it will try to maintain those um it'll try to maintain those sections as separate chunks to keep all the Contex together so that's and that's for uh chunking so let me go to embedding now so now embedding is interesting because there's a lot of different embedding models you can do and for this example

30:16

I've chosen open Ai and cooh here now open AI has a bunch of different embedding models and the two that I'm showing you are the uh text embedding 3 small and the Tex embedding 3 large um the small one is the fastest it embeds the quickest it's and uh it's much it's easier to use for smaller like if you're trying to do smaller amounts of data smaller data set sizes and you uh just

30:43

want to worry about uh quickness while text embedding 3 large is for um if you want higher quality embeddings and you want more accuracy in your in your queries and then I also included code here just just show like the different kinds of you don't have to just use open AI you can use any kind of embedding

31:01

model you want and with Co here you have multilingual support so it uh so when it embeds it keeps multilingual um the nature of multilingual chunks uh together so I can show and I also have another um example for just doing a batch and bedding uh with the open AI so I can show you the results of that

31:31

embedding let me make this more easy to see so you can see that each one uh and granted we're the the what we're ending is pretty small so you're not going to see much of a time difference but you can see uh different comparisons in terms of like uh the output uh of the different embedding models and their Dimensions um so right now we're only putting out the first five values but basically like you can

32:01

imagine each Vector uh each each V each embedding will be like so 1536 Dimensions will be basically a length of 1536 um so get pretty long um and then we can see that like when you do you can do multiple embedding texts at once and uh each embedding length will be 1536 and you can see how many embeddings you can do at one time

32:28

um next so next I'm going to show you guys um Vector upserting so basically we have now this is where uh chunking and embedding come into play um so now we have let me move this over here um now we have this these this document that's basically like it's basically a sample document that just goes over what Vector databases are and

32:57

now you can see in our in our example we make a document we have the type as markdown as well as our chunking strategy because it's a mark down file um we get the various chunks and then we generate eddings for each of those chunks right here um then we set up a a vector database instance so right now we're using a PG Vector um to store the vector embeddings

33:26

and then uh um we create an index to make it uh for the vector uh embeddings just so it's easier for um us to query on it later um so we'll create an index of uh with the name search examples and a dimension of 1536 and we'll upsert each of those embeddings including metadata that was found in the chunks and we'll include the the text metadata as well or the

33:54

text of the chunk as well in the metadata so let me show you that right now uh so you can see that I successfully upseted six embeddings and um basically what that means is that we can use these and we'll be using these this exact uh data in our next two examples um and let me while we're waiting for this that let

34:28

me just show you that so um so the first uh this next example is a search a vector search example um we basically I have two different searches that we're trying to do um one is just a very basic search with a so basically we ask the we ask the we ask this question what are the main features of De Vector databases um we embed that

34:58

uh that text with an embedding model and um we basically use that to query the vector DB and then we have a top K of three so we only want to return the top three um we basically it basically means we want to return the top three results and then we'll display those results here and then also so I'll show explain

35:21

this first and then uh we have another query that um this says how to implement Vector search um and that's very similar it's a pretty similar uh query except now that now we're going to filter on uh the certain metadata inside these embeddings just to get a more uh uh focused answer so let me try doing that right

35:51

now so you can see for the first one let me also okay let me this bigger uh so for the first response we we're asking what are the main features of vector databases and we get respond we get data back from our um from the from that markdown file that we upsert it into the vector DB um so we have we have uh the example

36:21

of uh values we have is text type source and section so um we get the certain text gave us as well as some other information like headers and section info um and these are all all this metadata is was was given was um generated when we upsert it into the DB so we getting all that info back um and then for the filtered search

36:47

results since we're um filtering on specific sections you'll noticed that uh each section so like while this one has um um so like this one will be um all focused around this section so implementation all them focus on implementation and giving us results based on that and then uh and this continues on

37:13

to the next example which is uh Vector we ranking which is very which basically takes the how we making Works in in this example is we uh filter and we get some results back from the from the vector DB and then we take those results and we take the query we asked and we uh give it a ranking model and for this example we're using C here re rank 3.5 and um based on the results we get

37:51

back and the query it will uh recalculate the uh importance of each result be returned so let me show you that in action thank you that's smaller Okay cool so you'll notice first and I want to focus on the score here so we have when we get the results back from the vector DB the uh we'll get a score which is basically a cosine a it's a similarity score um between the query and the

38:32

different uh vectors we get back so we see that our initial results are we have a 0.61 0.54 um and then so on we have a bunch of different scores for it but now that we want to rank them we we don't want the top 10 we not want the top three and we also want to see we also want to have the REM making model um changed the order depending on which one is most

39:02

important so you'll see that we used to have 0.61 as our as one of our um scores but the way uh it moned handles we ranking is that we'll take the initial Vector score we'll take a position score which is where that Vector was in the last results like so a position of one means it was the first um in the in the

39:30

listing and then we'll have it do another semantic score check um basically just checking the text of the data uh of the vector and the text of the query and it's basically just confirming that these are semantically similar and we'll take these values and we'll calculate the new score based on that so you'll

39:53

see that we used to have a 0.62 but now we have a 0.59 um and it's basically the same for each one of these re ranked results where we'll we'll calculate under the hood what the new score is going to be based on these various values and um basically from there so those that's basic the basic uh steps

40:19

for searching reranking and uh upserting um but now we can do um now I can show you how how this works in a more practical sense so we can actually move on to um doing a uh Dev preview of this so I have two more examples that are basically used we okay let me show you that um so I have two previews that basically

40:56

do director searching and reranking but um this time we're using tools um so we create a uh basic we created a basic Search tool for our agent to use and it basically takes everything we've done so far it'll create an embedding from the query whatever query a user asks and it will query the vector store using that embedding um and return results and the agent will use this tool to to come up

41:28

to for for format its response um so I'll start with that one first and we can show this in the M Dev right now so open m Dev so this is our Dev playground where it's makes it easier to see these various um various questions in in in action um so I can go to my basic agent and I will p pull up let me move this to my here so

42:00

I just as an example I can pull up like an example question to ask it I could also ask its capabilities so I'll just do that right now ask what capabilities it has and its capabilities are to explore and understand code bases using retrieval augmented generation and it can perform basic Vector searches and

42:22

um to find relevant information so let's ask it a question about our uh about certain vectory stores actually let me before I do that actually let me sorry I'm getting ahead of myself um for the for the uh documents that we uh store enter database we have U I basically have some test documents just to show what we're doing um so this one's like an example Json application

42:53

settings document um so it just has different information about Json settings in the um General with that and then we have one about authentication system guides so we have a markdown file that just talks about how authentication works and um like ol2 works and session management works and then we have a error handling

43:19

guide that just goes over this different error handling statements and then finally we have like a loging guide um just breaking down like how logging works and just logger implementation so these are all just example documents that we that we fed into our Vector DB that we can now query the agent on so for my first example I can ask it about the

43:46

authentication of JWT in our system and it will look into the um so now it's going to give me a summary of all the information from that those markon files and just print out different code Snippets and um and just explain General like what how JWT authentication Works um so it'll like and basically because of this it uses the code that we

44:19

have available so it will be able to give a pretty precise answer just based on that documentation um so B to go through it and tell me like import required libraries uh tells me about some the service class and how to generate tokens and verify tokens and you and is able to provide a pretty meaningful explanation

44:40

as it has access to all the proper documentation um so I can do another example of for this basic search and so like I could ask what are standard error handling patterns for API responses and uh it will look into the error handling docum mation and be able to print out um just precise information about what we

45:08

need and uh so this is the basic search so now I can show you um let me go back to the basic search so this is a basic Search tool and it's pretty powerful but um it does require you to like write this out yourself and if you want to it's not it's pretty basic in terms of there's no filtering capability right now and there's no uh

45:35

ability to rink but this that's where our where our query Vector comes into play so we have MRA has a built-in query Vector uh a qu Vector query tool that we can use to handle this for you so um it's it it's Bas the same as that Search tool except it also has the ability to uh filter on certain information and as well as has built-in ranking so you can

46:06

rank your results without needing to personally handle that um so I'll show you how this works as well in uh in uh practice so we can switch to this other agent that utilizes it so we have a query Vector agent that is hooked up to the the tool and we can um just ask it some questions so using the same uh data as before we can ask it like well let me ask also first

46:39

let me ask what its capabilities are um so it has a very has a lot more U instructions um so where I can quate filters it can use specific tools as information retrieval and is able to explore the code base um and I can ask you now some information about how we handle database areas and validation areas let me do that right

47:17

now so now it's able to give me precise information just like before but it's now um with the with our built-in tool and and I can show you this next one which is um basically how filtering works with this tool so right now I'm using not using a filter or it's not but it's not enabled so if I ask this question it

47:44

will it will it it will look through different parts of our documentation and return um it will will turn information but it won't give me exactly what the the filtering is so it will show me error handling and off which we really only want aor handling so if I enable the filter now and this allows you to choose if you want a filter or not so

48:11

I can ask that same question again and um let me ask that same question again now with filtering enabled works ready reset it okay cool now now every response it gets is properly filtering our data and making sure that we only want from um like from our from our query it's um we want to filter on era handling so all

48:45

our responsib are from era handling and that's working as intended um I think I have time for one more example so we can show how we ranking Works in this uh with this tool as well um let me go ahead and show that to you so this will we have a query right here which is find the most relevant code examples showing JWT token verification and

49:18

session validation um so now this will this now we have now we have U we ranking turned on so now it will go through the database and this it does it under the hood it will go through the database and fetch results and would will re rink to make sure that these are the most appropriate and relevant responses so Nick is this the last example I did see there's a in because I get asked the first question I saw

49:57

there's a bonus folder what's what's in the bonus folder of the re I haven't I you know I know we talked about it before but I honestly don't know so I'm curious okay yeah I can show the the bonus example um I mean maybe real quick I know we will have some time for questions but if it's we can of course

50:15

just send people there if you think that's more relevant but if we if you think we can show show it in five minutes that would be pretty cool yeah we can show it in five minutes um Let me let me show it off then so basic basically I made a bonus section that you can explore in the repo when you get hands on it um it's very similar to the tools we've been looking at so far

50:40

except this one is completely focused on how to use this with code so it's basically a mini i to describe it would be like a mini um uh like a like a cursor or wind serve where it will take files from these documents so like I have a bunch of different actual like class documents like like I have an authentication

51:04

Service Code um file and then I have air handling um that has that this is defines a bunch of different classes and functions as well as logging so we have all these set up and um the tool itself will look through our um look through this code basee that's been upseted into the vector DB and be able to provide info and give you information on like

51:34

give these code Snippets give you examples or functions and then we have an agent set up just for that um and then giving the giving you examples of the each code base so I can show you that right now let me go ahead and set it up in our um in the in in the playground I'm do that right now um so now we have a code agent um that we can use to do specific

52:11

queries so I can do this right now I can ask it um to like find implementation and we'll compare we can use this to compare um okay let me use this to compare this side by side so we have like different um functions like generate verify validate invalidate and we can use it to ask a specific question to it about like the verifi token method so we could say find the implementation of the verify token

52:49

method in our authentication system and now it will look through our the code base that we gave it and it will provide me the exact provide me the exact code that was in our code base and then we give explanation of how it's working and what parameters you need and um how it handles different areas so this is a pretty useful tool um and this was just

53:23

an example of what the power you could you have when you use rag that you can uh just give it your code and it will be able to process that code and you know help you it's basically a very good assistant to uh help you look at and explain different parts of your code base uh should we do you want me to show more examples or uh yeah I think that

53:51

that's good um that this has been really cool very helpful so maybe we'll we have about 10 minutes maybe we will open it up for questions Obby thanks for fing most of the questions as we went so I don't think there's a lot in the chat right now to answer you know there's a few things yes the recording will be shared it'll send

54:09

it out so that's more of a housekeeping question you you'll get it it does take a few hours before that email will go out so in a few hours you should all have it is there any other questions though feel free to drop them in the chat and we'll stick around for a few minutes and answer any that come in okay so let's

54:26

see first question here so one question is how can I retrieve data from the mastera API slst stream in a format similar to open AI I assume it's the standard but I couldn't find it in the docs so I don't I don't know if I fully understand that Nick does that make sense to you did M API some to open yeah um yeah or we can ask for more

55:05

information too if you do need that yeah I need more clarification on that one but yeah yeah Bogden if you can provide just a little more clarification maybe we're we can help with that uh how to deploy to verell there is yeah function limits and deploying to versel so we are improving we have been improving our deployers this week to try to help with some of those things but it

55:29

is an issue where you know versell does have some some function limits and depending on how you set up and what you're using you might hit those limits depending on your dependencies so uh that's just a kind of an ongoing challenge with deploying agents and uh you know these kind of AI type applications on serverless environments but if you do uh if you are running into issues with forell you should go to

55:56

master. and click on the request access to our cloud and send me an email and I will get you access to see if it works there does the create rag tool output okay so here's a question from Jeremy does the create rag tool output tool get invoked by the llm determining it's a tool call to make or does it get invoked deterministically before the initial prompt is sent to the llm yeah I I was

56:22

actually going to type out a response but I'll say it in the uh call um currently the llm determines when a tool call is appropriate to make uh it and that depends on your query as well as the tool description and its different parameters so a tool in order to help the llm pick the tool most reliably it's important to have a good description this a very clear description about what the Tool's

56:49

purpose is and in your uh query it's good to have a um it's good to like say you wanted to get the weather of a certain location you want the you want your tool to specifically be about weather information so you want to say something like oh tell me the weather like this tool gets the weather for uh specific uh

57:12

address and then in your uh query you'd want your uh query to be related to like what's the weather in Austin or whatso weather in New York and that way the that's the most reliable to get the the um the agent to call that tool okay here's another good uh question from I think it's jeanpierre I believe

57:39

if I'm pronouncing that correctly considering the embedding the embeddings correspond to one semantic representation context signification and some chunks might be dense in information and have multiple significations what are the thoughts about possibly having multiple embedding vectors for a single

58:00

chunk I guess so what do we think about that I I have some you know General thoughts but I guess Nick anything from you otherwise I can answer this one too I guess more questions by by multiple in vectors they mean by taking a trunk and embedding it multiple different ways or that that's my that's my interpretation I I would say you should experiment with these

58:23

kind of things these are good questions to ask and one of the things I would highly recommend is come up with some kind of evals for judging in a data set for judging retrieval and judging a final output so you kind of have you got to judge the retrieval you got to judge the final output and then you can start

58:40

to try these different both chunking strategies but also maybe embedding strategies maybe you you you embed it in multiple ways and you can pass you know you can do multiple retrievals passing that into context of course all this complexity you know a lot of times you're adding all this complexity for very little gain and you you increase

59:00

cost and latency so those are all things you should really be thinking through as you're as you're making those decisions but I I don't have any strong indication that that would make it better I think it would be use case dependent but it's certainly something you could continue consider trying um so is chunking necessary I

59:20

have distinct entities in my database but they seem short enough to be sent to the embedding model directly with the chunking step what do you think Nick it really does depend on your use case um but I would if they're short enough yeah it depends on your use case and what kind of embedding model you want to use if it's small enough you don't um feel like you need chunking

59:45

then that would be fine but you probably want it it if it's longer long enough where uh retrieval becomes in efficient then you probably want to consider chunking because you don't want it to you don't want the chunks to be too big or you don't want the text to be too big when you embed it then it makes it hard

1:00:05

for the tool or the agent or however you want to query the vector DB to properly get your proper info yeah and related to that you know one thing that I often see people do is kind of a little bit more of an advanced drag strategy is you not only just get the the especially if you do have chunking strategies and you do end up chunking you not only get that chunk but

1:00:29

you might get the nearest neighbors to that chunk because if you think about something you know how we would search a document if we only get one paragraph It's actually kind of nice to have the paragraph maybe before and after so you can kind of get more of the context of what that actual paragraph means it's kind of hard if you just have that one paragraph So a a little bit related to

1:00:48

that question but that's a good question um so any thoughts on oh go ahead Nick no it's it's the big brother go ahead Abby so we are really good friends with this company called raggy and they do rag as a service and when we talk to them about just chunking strategies um and they do multiple chunking strategies because they cannot determine imagine right if you're running rag as a service and

1:01:15

you're doing this for other people you don't know what the documents are you don't know anything so like I mean truly you don't know anything about the customer's documents right so they do like like 10 strategies and then they like will cycle through them to get the best results so to that question about should you do multiple yeah maybe like

1:01:36

if you truly don't know the source like maybe you should do different strategies just to shore up like be careful about or like cross all the dots and um but if you do know what your you're getting into then you should make more like defined strategies so yeah okay thank you yeah yeah good good answer okay any thoughts on graph rag with MRA Nick so we

1:02:02

actually do have a graph rag uh implementation with monra um it's it's in its separate tool as well so you can use a create graph rag tool and uh get a lot of good functionality out of graph Rag and for people who don't know what graph rag is it's basically a more it's a it's a good way to keep uh connections between different vectors uh

1:02:28

possible so it will what it will do is it creates a graph of information where when you insert the information to into a DB not only are you uh stor that info but you storing info and connections between different parts of uh different pieces of data so like if two pieces of data are connected in some way um and you want to maintain that connection so whenever you grab a certain piece of

1:02:55

info and there's you got five different connections and you want to know that those are all related it will fetch the data you want and will fetch pieces of data that are connected to it and be able to better explain the different connections so it's pretty useful and we have a u implementation currently cool yeah continuing on we got a lot of good questions so we'll answer a few more we might not get to them all

1:03:24

but if you I I will say if you do have more questions about this stuff you should come to our Discord ask them in Discord we'll follow up after the fact and and try to answer them if in work with you if you do have you know more specific questions as you dig into actually trying this stuff out so that that's kind of the plug for the

1:03:42

Discord uh and actually I'll share my screen just to kind of wrap this up but there's a couple more questions here that we will take if we can get a couple couple more done so give me one second here I guess while while I do this let's maybe how would you future proof how would you work to Future proof in terms of the embedding vectors considering embedding models will evolve over time

1:04:04

and our database will grow over time that's a great question yeah I mean I would say go ahead so we will we will work on this incremental okay so when we worked at Gatsby we built these incremental builds right where you only build the part of the graph that changes now that's on our road map and like that's what that's the

1:04:33

dream of this right you're only changing the things that change over time so like you're embedding models as they get better like if the embedding model gets better you have to reindex like that's just the truth right you have to reindex your data that sucks but like it's the truth but if you're doing like if you're

1:04:51

reindexing data and you only change part of your graph you only want to rechange that part right you don't want to spend the the money the time Etc reindexing everything it's not easy to just reindex parts of your graph right now that's a that's a big problem that's a big money problem actually if you

1:05:10

think about it and so we want to do that but and like embedding models are going to get better people are going to fine-tune embedding models you are going to reindex your data that's 100% certainty so if you don't like that I mean just you know look in the mirror and accept it today but you want to incrementally index the things within the C same model

1:05:34

and same parameters you want to reindex only the parts of this thing that change so um I don't know if that helps but that's just the truth yes thank you all right so let's we do have a few more questions we are at time though so I'm just going to do a few quick plugs if you're still here please go Star us on GitHub if you

1:05:59

haven't already we would appreciate that helps more people find us if you're interested in trying out our Cloud product go to the website and click on request access or go to this and just join the wait list we're slowly rolling out uh our Cloud product to multiple people and we want to get feedback so you can connect with me or Nick on

1:06:19

Twitter or me on LinkedIn here's all the links thank you all for attending you will get this in your inbox in a few hours and if you didn't get your question answered apologize we ran out of time please drop into our Discord and we will help answer those questions thanks everyone for attending and yeah have a great rest of your

1:06:42

day he everybody yes thank you guys thanks a lot thank you bye

More workshops

Workshop Hosts

Nik Aiyer

Nik Aiyer Mastra

Shane Thomas

Shane Thomas Mastra

Watch Recording