Advanced RAG with LlamaIndex - Metadata Extraction [2025]

Alejandro AO · Beginner ·🔍 RAG & Vector Search ·1y ago

Skills: LLM Engineering80%RAG Basics70%Vector Stores60%

Key Takeaways

This video covers advanced Retrieval-Augmented Generation (RAG) techniques, focusing on metadata augmentation and filtering using LlamaIndex, to improve RAG model performance and accuracy. It provides a hands-on notebook example using LlamaIndex and vector stores.

Full Transcript

good morning everyone how's it going today welcome back to the channel and uh welcome to this tutorial where we're going to be covering a little bit more advanced topics on rag we're going to be building a complete rag pipeline using L Index this time um usually if you're familiar with L indix you know that there is a very famous feliner that allows you to chat pretty much with any document you want uh in this case what we're going to be doing is we're going to be building the entire pipeline by our ourselves and that is going to allow us to perform some pretty sophisticated Transformations on the text uh chunks that we're going to be extracting uh to be more precise we're going to be uh doing something that we call um metadata augmentation which is the fact that we're going to be taking every single one of the chunks that we're going to be extracting from our PDFs and our documents and adding as metadata example questions and answers that they can answer to example titles Etc before for embedding everything and putting everything together into the vector store okay this is of course going to help us with ranking and when uh retrieval during the retrieval process to find the most relevant uh pieces of um information related to our query okay uh so that's what we're going to be doing today I hope that you find that interesting and uh before I forget I wanted also to mention that I'm currently hosting an AI engineering cohort which is you say 12we uh course where I'm going to be live um answering your questions and uh doing some q&as with you and having some FaceTime with you to be sure to take you from Zero to Hero and to that you become an AI engineer okay and you can start implementing all these topics in your job or in uh your freelance uh work okay uh you can check the link in the description the website's over here and um yeah please join the weight list and we're already 50 over there so I can't wait to have you um join too okay so without any further Ado let's actually go to building this a super cool rag [Music] [Music] pipeline all right so before actually going into the notebook I wanted to take a look at this uh diagram again I mean in case you're not familiar with the whole rag over structure data uh pipeline essentially we're going to be extracting the text from our unstructured data right here it can be Word documents PDF files HTML files uh PowerPoint presentations whatever we're going to use a loader from L index in this case we're going to be using the simple directory loader which essentially just extracts the text out of pretty much any file that we have in a direct and uh from that we're going to get a a bunch of um documents in the form of text with their metadata uh we're going to split that into different chunks as well and uh something particular that we're going to be doing in this uh implementation is that we're going to be doing some extra Transformations at this point right here okay so once we have the chunks um this chunks uh that are going to be shorter so that we can send them as context to the language model um we're going to be adding some extra metadata to them and this metadata is going to be some examples of questions and answers that the information within the chunk is capable of answering for example if the chunk talks about um say the Renaissance period uh we're going to be adding some questions and answers about the Renaissance period to the metadata to improve the the embeddings and to actually put this uh Chunk in the correct Vector space okay uh we're also going to be adding some custom title to it and actually something pretty important right here is that we're going to be adding that metadata in the embeddings okay so we're not only embedding the chunk of text itself we're embedding the metadata with it as well okay so that is quite important and it's going to help with the retrieval then after that uh straightforward rag the user is going to send their query and something pretty important here as well is that we have to add the same transformations to the user query um at least at least we have to use the same embeddings model and uh it would be ideal to to add the same Transformations that we're going to be using right here uh to the user query as well although that is not mandatory okay and uh once we have that query with the embeddings of the query we're going to send that to the vector database the vector database is going to return to us the chunks of text that are the most relevant uh related to the uh user query and we're going to select the top ranked one and send it as context to our language model so our language model is capable capable of answering a question about that topic right there okay uh so that's essentially what we're going to be doing as you can see the more um new uh thing right here is the Transformations right here um as I mentioned before these are transformations that we're going to be applying using L index however uh rest assured that if you're not using L index these Transformations are pretty straightforward to apply in pretty much any other uh llm framework okay so I'm using here L index because it has these Transformations pre-built as modules within it but uh it shouldn't be very very hard to implement this and say langra Lang chain l or whatever rag framework you're using okay so there you go that is what we're going to be building so let's actually get uh right into the notebook to start actually extracting information from this pretty nice um paper that was released a few days ago okay so there we go first thing that we're going to do is we're going to be installing Nest as KO because we're going to be using some uh in code in this um notebook and in order to make sure that it works correctly in our uh notebook we should do Nest as KO do apply and then we're going to be installing L index and uh something that I didn't mention is that we're going to be using pretty much only open source models to keep costs low I'm going to be using huging face um for embeddings and I'm going to be uh bringing my language models from Gro okay so I'm going to be showing that in just a moment so first things first we're going to go to extracting the data all right so now it is time to actually extract the contents of our document okay for this example I'm going to be using a single document however you can uh feel free to use as many as you want I'm going to be using this one right here it's just a pretty recent paper on um an AI called Health GPT I haven't actually uh read it through but it's U one of the newer ones uh if you're interested essentially just introduces a an AI that is uh capable of solving uh medical large um Vision language models uh tasks so there you go this is what we're going to be using it's a 19 page uh PDF document and we're going to extract it using this simple directory reader okay so in order to do that what I did is I created a data directory right here and within it I just added my file okay now once I have that file all I have to do is do simple directory reader and then I just pass as a parameter the input directory and this simple directory reader that you import from L index essentially just extracts the text from every single document within that directory okay and it doesn't matter the format that you're using so you can be using um you can have several PowerPoint presentations in there you can have um PDF files in it and uh it's going to extract the text from all of those okay so pretty um easy to use interface and uh so we're going to round it like this and as you can see we get 19 documents from this single um file and as you can see of course this 19 corresponds to the 19 pages so I generated a document for each page um now actually let's take a look at what they look like as you can see each document has an ID um and a bunch of other metadata actually um to be clear right here uh document in uh when talking about a document under the framework of L index essentially it means an object that contains some information and is also linked to a source file okay so this one right here is going to to contain uh this text right here from the off from the original document and it's also going to contain a bunch of metadata that links it to its original file okay um so we have an ID we do not have embeddings and we have a bunch of metadata as you can see we have one document per page so 19 documents right here and you have the label for the page number right here as well in the metadata we have the file name we have the file path as well pretty convenient file type file size the creation date the last modified date and um this piece of metadata is quite uh useful I'm going to be going a little bit more deeply into that but it basically uh controls which data which um metadata is going to be sent to your embeddings model okay because you probably do not want for example the modified date to go to your embedding model because that's going to it's probably not going to be useful at all to uh find to locate the meaning of your of your document in the semantic field okay uh same thing for the language model going to be going a little bit more uh in detail into that later and uh there you go we have no relationships for any metadata template uh this we're also going to be covering in a moment metadata separator we're going to cover that a little bit later on and the text resources essentially what I showed you essentially all the text that will that is corresponds to that page so there you go we have the data that was extracted conveniently for us uh we didn't do anything other than just use Simple directory reader and it automatically added all of this very useful metadata for us now something very convenient I mean in case you want to do that you can actually extract ract the documents using the name the file name as ID so as you can see right here I am using a unique Universal ID uh but you can do that uh using but you can use instead the file name as ID not like this one second you can do that instead using the file name as ID that's also possible um I personally prefer to to use just the regular um unique identifier but uh do as you as you feel like as it is convenient for your application okay so there you go that is for the extraction now we're going to go to the more convenient and interesting part which is the transformation and we're going to be applying a lot of Transformations that are going to help us to uh improve the retrieval process of our information okay all right so we're getting to the fun part right now we're talking about Transformations we're going to apply some very useful and sophisticated transformations to our data before sending them to the embeddings model and also to the language model now this is going to help us have a more reliable rack Pipeline and something also that I should mention is that although of course we're using L Index right now and it has them L index has this uh Transformations built in um these are transformations that are not very difficult to implement if you want to use them in any other framework that you're using in general these are just very good techniques uh for improving the performance of your rag pipel okay so uh first thing that I'm going to do right now is I'm going to show you uh how to do um what we're going to be doing essentially is we're going to be updating or expanding the metadata of each one of our documents before um indexing it okay so as you can see uh our documents they look like this they're super long they have a lot of metadata that we're probably not going to be using um and uh yeah so you probably do not want to send all of this to either your embeddings model or to your language model okay so in order to fix this what we can do is we are going to first of all Define which um keys from your metadata are actually going to be sent to both both your embeddings model and your language model and um this is something to take into consideration because sometimes you don't consider the fact that when you are sending your document or your node to your embeddings model before putting it into your vector database you're actually sending the metadata as well not only the content of the text okay and um that can be useful in some cases if you but only if you add the uh relevant uh parts of your uh metadata to it okay so in this case we're going to be just creating a very simple document uh just to Showcase this um uh the text of the document is this is just a super customized document um by the way this comes straight from the documentation from L index and then we have three uh key value pairs right here in the metadata so we have the file name the category and the author and then we have this excluded embed metadata keys and actually I'm going to add the same thing for the llm and these are the keys I'm just going to comment this out first and these are the keys that are not going to be sent to the embeddings model okay and I'm going to hide this for now too uh so let's see what the embedding model is going to see in order to see uh just for testing what your embedding model is going to see uh and try to embed in the vector space we can use this metadata mode uh module right here and uh you're just going to do document. get content and as metadata mode you're going to in include the embed uh metadata mode. embed right here so I'm going to execute this and as you can see the embedding model sees this the category the author and then the contents of the document now sometimes you want the category to be embedded uh because probably that will help to find uh to position your embeddings at the correct Point sometimes you may want the author to be embedded as well because you want that as well to be located within your vector space but let's suppose that for example you have the page number right there you probably do not want the page number to be embedded and uh uh and add it to your vector okay that's definitely not going to help you with the semantic search so that is what this is useful for and same thing for the language model I'm just going to uncomment this spot right here and let's see what the language model actually sees so excluded LM metadata keys and right here I'm going to say that I'm going to remove the category uh from here I'm going to put this and you can see that the llm sees that the file name is this one right here this one's probably going to be important for the language model in case it wants to for example um uh mention the source where it got the context from uh if it's a rack system for example and um then we have the author and then the content right here okay very convenient uh so far so good but uh something else that I wanted to show you is how to actually format the metadata when it is sent to the language model and also to your embeddings model okay so there are essentially three parameters that you can change uh the first one is the metadata separator essentially what is going to what character is going to be uh between each part of each metadata so in this case we have file name category and author and I just said that for everyone uh we have a jump line this one by the way it's the the default value for this setting right here and then for the metadata template we're going to have the key and then a column and the value like you see right here simp to be working fine for me but I mean you can update it if you want you can add an arrow here instead and it will do something like this and this is what is going to be sent to the language model um this is probably not super useful but just know that you can tweak this if your language model for some reason works better with other separators and uh this one right here is probably the more useful one so the text template essentially takes this two variables right here metadata string and content content and uh those essentially are replaced with the content of the document and the metadata so my metadata is going to be um applied like this I'm actually going to add a jump line right here and then just a separator and then the content like this I'm just going to add a jump line here too and uh let's take a look at what the language model sees now so as you can see we have the metadata and we have the actual information right here for the metadata and then we have content right here this is probably going to make it easier for the language model to see what is going on um or sometimes if you want to do the the anthropic way well I don't know if it's really the anthropic guys who made uh made this up but um I uh these are they are the first ones that I saw use using B for language models so essentially you add them add your data between um uh Mark uh markup TX uh so yeah there you go that is how you format your metadata and how you U make sure that it is correctly uh formatted before you send it to your language model or to your embeddings model um so that's the first part but uh something else that I want to show you is that you I mean so let's actually take a look at our own data and how it's supposed to look so right here you have the page label and the file path because remember that we had some values by default um when we extracted our data so we had that our let's find them right here we had that the excluded embedding the excluded meta metadata keys from the embeddings are going to be the file name the file type the file size the creation date the last modified date and the last access date which makes complete sense you probably do not want to embed those they have nothing to do with the meaning of of your uh yeah with the semantic value of your vector so you probably do not want them and for the llm we are for some reason we excluding the file name I think that's okay because we are not excluding The Source uh there we go I mean the path file uh the file path sorry the file type the file size we don't need that either there you go so it seems to be okay for me um but uh some thing that was not included here is the the page label so for some reason we are sending the page label to our embeddings model so we probably do not want that and in order to remove that all that we're going to have to do is just say that for every single document in our documents that we just extracted we're going to first of all add this nice text template with the metadata right here as I showed you before and then if the P if the page label is not in the excluded embed metadata Keys array within the um metad data of the document we're going to add it so that the page label is not sent to our embeddings model okay so let's execute that and let's take a look at how it looks like now so now what is going to be sent to our embedding model it's going to be sent the file path and the content and in this case we could also remove the file PA the file path if you want but I mean it's going to leave it like that for now but uh there you go that is already adding some uh useful granular control over what is actually sent to your language model and to your embeddings model uh to be sure that you do not send uh just a trash to your models that they're just going to be confused about it okay so there you go now the next thing that we're going to do is we're going to apply more sophisticated transformations and uh for this Transformations we're going to be using um language models to actually extract information from these nodes okay so let's take a look at that okay so even more fun now we're actually going to be extracting some data some information from each one of our documents before um before embedding them uh to add more uh information to the embeddings okay so right now as you can see we have our documents just like they are and we did uh update a little bit the metadata that is going to be sent but we want something a little bit even more granular right we want to have more control and we have to also uh augment the metadata of each one of our documents to be sure that they contain uh that they actually contain the that are easier to to retrieve okay so for example let's suppose that the information that I want to query is located on this document right here and this document yeah let's say this this one right here now this let's go for the first one okay uh so the document that I the information that I want to retrieve actually comes from the first document um but the first document is just uh the text itself it does not it's not a description of the text itself it's just the text itself right so it just talks about something um some interest an interesting technique to to improve rack is essentially to extract um uh a summary from this spot right here or the title from it or some example questions and answers that this piece of text could uh potentially help solve and that's what we're going to be doing here we're going to be using a language model so that for each one of our of these documents we're going to be extracting a title for the single document we're going to be extracting a set of questions and answers that this particular piece of information could potentially help us answer okay so in order to do this uh we're going to be using some modules that are pre-built with uh L index uh but first of all of course to extract this information we're going to need to use a language model and in my case I'm going to be using an open source language model um and in this case the use case is quite uh small it's just extracting a summary extracting a title extracting a list of questions and answers and I can do this with a fairly small uh language model and I'm using Gro right here so I'm going to take a look at the at the pricing so I'm pretty sure that a very small language model would be more than enough for this so even say which one let's say this one right here even quen uh Q 2.5 uh it's going to be enough it has a very very affordable uh uh price tag right here per million tokens I mean for reference um um uh open AI is like three times more expensive than even the more the most expensive one right here and this ones are open source so there you go this is the one that I'm going to be using right here and um just for reference it's actually located right here okay um so in order to use it I'm going to first just create an API key right here just going to call it temp one to remember that I have to remove it later call it temp temp two cuz I think I used temp one already going to copy this and let's go right here and um just going to be importing from Lama index LMS Gro I'm going to be importing Grog right here and just going to initialize my API key like this so uh very straightforward um this is going to ask me the API key there you go enter and there we go so here's the model that I'm going to be using for extracting these titles and also the questions and answers for examples okay uh also make sure to select I mean if if you're using Gro or whichever uh language model system you're using uh be sure to select one that has quite a High um rate limit per um per minute because uh we're going to be uh sending a bunch of requests to the language model because we're going to be sending a lot of requests in batch to actually make sure that we're getting the summary of every single document okay so if you have 19 documents for example it's going to send the document and then ask a question about it uh so that you get the title then it's going to do the same thing about three questions so you're going to get end up having a bunch of question a bunch of API calls right here uh just be sure with be sure uh to consider that uh to consider that into your uh expenses as well if you're paying for this um so this is the llm for the Transformations that I'm um initializing and uh there we go so the Transformations that I told you we're going to be doing is we're going to use title extractor and questions answered extractor so essentially it's going to take each document it's going to generate a title and it's also going to generate a questions answered question uh a few pairs of questions and answers that that particular document is capable of responding to um and then we're going to be just splitting every single one okay um actually in this particular case maybe we want to use the text splitter um yeah no we probably need that at the beginning yeah that's good point and in this case we're going to be using the ingestion pipeline essentially this just um lists a list of Transformations that we're going to be applying to all of our notes or or all of our documents so the pipeline we initialize it like this ingestion Pipeline and then we pass in the parameter of Transformations which is going to just an array of every single one of your Transformations uh in order of course so the first one that is going to be applied is a text splitter essentially we're initializing this sentence splitter right here with about a th um tokens per split and we're going to have an overlap of 128 just remember that the overlap is uh to make sure that we don't cut um that we don't cut paragraphs in half or something like that um in any case the sentence splitter already uh tries to optimize uh so that no sentence is left uh it's got in the middle but anyways um this is still probably a good idea to use um so there you go we initialize our Pipeline and then we just do pipeline run and we pass in the documents we say that in place is true and we're going to be showing the document the process um right here so the documents are the ones that I just extracted raed and I can now oh and something that uh is quite important too I initialize this title and Q&A extractors and both of them require that you pass in as a parameter a language model of course because it is not going to just magically come up with a title and the set of questions and answers for your document you have to actually use the language model to get them um here is the batches and the number of questions that you want for each um um for each one of them so there you go there you go that it's parsing them first and then it's going to start using my language model to actually generate the questions and answers and um then we can start seeing what that comes up with I'm probably going to pause the video right here a little bit uh because this may take a few minutes so I'll be back in a moment well actually it didn't take that long that was pretty fast so there you go we should now have our documents uh with the metadata extracted and uh with a bunch of new things so let's take a look at them we have 30 new documents and um now let's actually take a look at what the uh Metadate oh let's just take a look at the docu let's just take a look at one first um before actually printing this let's just take a look at the first one [Music] um let's do pretty print let's pretty print everything and see what they look like um module is not colable so I did not import PR the print don't po po this there you go so we have all the notes you can see that every single one has its ID no embeddings yet of course we have not done that yet and we have a bunch of metadata so we have one uh note for the first page two notes for the second one the third page was divided into two into two notes fourth one was left like that so there we go we have a bunch of splits of course splitting is useful as you're if you're using a language model with uh lower context if you have a pretty long context you probably do not probably can use a longer chunk size than this one but um yeah just just to be clear um we have the file name too right here we have the file path pretty convenient file name file size okay we have a document title and there we go we have the title and then we have another title right here okay um and then a little bit later we have the set of questions and answers I'm going to show you that in just a moment um actually I'm going to going to show you that when we go right here so first let me just check the first note okay so we have the first Noe right here and I'm going to print it as a dictionary like this so to actually take a look at it and this is the full dictionary here so we have the embedding none um the file name file path Etc and then here we have another piece of metadata very useful questions that this excer excerpt can answer so I mean you can see that of course you can see that this language model that generated it um but as you can see what are the specific findings of pathological changes observed in the font's image provided in the document Etc and then the question the answer to the question is this one Etc so this is essentially going to help you to locate this particular node in your vector space more accurately because you will now have examples of the answers and questions that this particular piece of uh data can answer to and it's going to be easier to retrieve her later on uh to retrieve it later on uh so if you want to take a look at what actually the embeddings model is going to see when you send this uh kind of document so as you can see it will first of all send the file path the document title and also a set of questions right here um this one is probably useful uh with longer documents probably not super convenient to have a title for every single um excerpt of a thousand words but uh in any case this um is also one technique I think that the one that is a little bit more useful and interesting is this one the set of questions and answers that is going to be added to the to the embeddings and then we have the content right here and now let's take a look at what the LM is going to see uh so the LM also includes the title and the questions and answers and then the actual content so very convenient there we go um so that is already how to add some bunch of uh more useful information to every single one of your documents of your notes that you're going to be uh embedding before even creating your index so as I said before this is something useful not only for when using lamb Index this is going to be useful if you're using langra for example or Lang chain site um probably going to be a little bit more manual um but uh yeah these are techniques in general that will always help you um have better results in rack okay so there you go awesome so now that we have our our notes complete we have selected which uh metadata is going to be sent to the embedding and to the language model and we have also augmented that metadata uh we're going to be able to actually create a an index for them okay and in case you're not you don't remember an index is essentially a um set of all of your notes that are going to make it more easy to retrieve and in rag what we usually do is that is this index is a semantic index which means that it comes from a Vector database and in this case we're going to be using two different types of indexes the first one is just going to we're going to be embedding our documents and we're going to be putting them into a simple list uh without a vector database and then we're going to be using a d a vector database okay um but first of all we're going to have to select an an embedding model I usually use open AI because it's cheap and I mean it's affordable and pretty fast and the API works great but if you want to use a free one you can always use uh something from hug and face for example and that's what I'm going to do right now so I'm going to say that from L index embeddings hugging phase I think if I remember correctly I installed it uh up here probably I should move this uh cell down here uh to be clear that this is the one that this is the one that I'm using right here so I'm going to put it here and um I'm going to be us using this one right here just a small embeding model that should be able to work so in order to do this we do L index. embeddings hogging face and from there we import uh hugging face embedding and um here you just name uh whichever you want to use from hugging phase um and then right here I'm just testing it so I'm essentially going to be running get text embedding for just a very quick hello World um just to show you that this is what is actually returned so it's probably going to take a little bit of time because it has to actually download the embeddings model and run it within my uh computer but uh once that is finished it should be able to there you go now it has actually printed the embeddings for um hello world and these are the this is essentially the multi-dimensional vector that represents the the sentence hello world so there you go uh what we're going to be doing is we're going to be passing this one right here to our simple Vector store index uh to actually generate an index okay so we just run this part right here so from L index score we import Vector store index and we pass in our notes of course these are the notes that we just previously run through several Transformations and we augmented the metadata and this is the index that we're going to be able to query afterwards okay so uh in order to query it I'm going to just wait for this to actually finish the embeddings because it's probably going to take a little bit of time uh when I'm running a hog on hogging phase it's a little bit slower but um there you go that one now that this is finished we can actually start actually quering this uh index and in order to do this I'm going to be using another language model uh previously I showed you I was using uh I think it was qan uh but in this case I'm going to be using a more powerful language model to actually uh synthetize the answers I probably don't want a very small model doing that so I'm going to use a bigger one in this case I which one did I choose LMA 3.3 70b uh so it is this one right here and I have a developer account on on Gro so I have higher uh limits right here but uh feel free to use the one I mean you can use the free tier as well right um and this one right here is also very very affordable so very convenient um so there we go I'm going to initialize it as llm quering uh from Grog which I imported before and the API ke is going to be the same one that I used before and pretty much every index uh element in L index has this a method called s query engine and it takes the language model that you're going to be using for it and it essentially just converts it into a query engine that you can just pass any um string and it will return an answer to it U by adding the context that you previously loaded to your vector database right so um so my response is going to be what does this I mean my question is going to be what does this model do when you do query and and there you can see that the model known as health GPT is a medical large Vision language model Etc works pretty good and now you already have a query engine that you can use on top of pretty much any document you want okay and uh you have also augmented the metadata to make sure that the retrieval is much better than it would be you had if you had just um extracted the contents um like that and something else is that we can check the uh the full um uh the full schema of the response because it's not just this text right here actually the response contains much more information so it has the response yes but it also has the source nodes where this data came from in this particular case it has uh it came from we can actually take a look at this so we're going to do up like this we're going to take we're going to see which are the notes that it took them from so we're going to check this uh it took them from this two and it says that from page one and from page six and um as you can see it also retrieved part of the questions and answers that we had because we am them as well okay so very convenient and that is of course not the only piece of information that comes within this within this uh within the response object we have a lot of things um actually going to not use pretty print for this um we have the file name the file path the file size Etc I mean in case you want that that'll have the document title so you can see I mean even though all of this metadata was not sent to the language model or to the edics model you are actually retrieving it um so that is quite convenient uh so there you go that is how this works now what you can do is you can and now that you have an index you probably do not want to re-embed every single time all of your notes every single time that you're going to be running this because now if I turn this off and I reload I mean I restart the this U this notebook uh it's going to try to re-embed everything again and I probably do not want that I probably want my index to remain uh uh persistent in my machine so I don't have to spend time and money uh regenerating the embeddings and the vectors so in order to do that all I have to do is to um store this in a persistent store uh so let's see how to do that all right so uh what we're going to be doing now is we're going to be doing pretty much the same thing that we did before but uh instead of doing it with just simple key value pairs of your embeddings uh for your vectors we're going to be doing the same thing but with an actual Vector database because this is what you would probably do in a real world scenario uh so in order to do this we're going to be installing chrom ADB it's an open- Source uh Vector database very useful and very easy to use and of course we're going to be using the uh integration with L index I'm going to be uh running this installers right here so pip install chroma DB and pip install L index Vector stores chroma and uh once's that once that's done I'm actually going to be just initializing it from the same notes that I have already uh generated right remember that I had uh in the in the previous part of this this lesson I have already created the nodes I have expanded them using several Transformations uh in order to improve the rag system and in this case we're going to be doing the same thing but with um uh we're going to be using those same notes but adding them to the vector database instead of just to a simple uh key value per store like this one right here uh so in order to do that we're going to first initialize chrom ad db. persistent client now for your information this is actually um is not related to L index you're using directly the chrom ADB client from chroma which is kind of a a quick difference from how for example Lang chain does it Lang chain uh does not require you to touch the prev the foreign uh I mean the third party uh client right here uh you can they of course do that in langing too but in L index that is how they do it by default so you initialize your chrom ADB uh you set a persistent client in this case they I called it chrom ADB like this we create a collection of course uh so we do get or create collection I'm going to call it um Health GPT like this and uh then I'm going to assign chroma as the vector store to the context so I'm going to say that Vector store chroma Vector store um Vector store is going to be be equal to chrona Vector store and the collection is going to be taking this one right here which is the uh actual collection object returned from the chroma client okay so this one right here is where the native chroma client um comes in contact with the third party integration of uh LMA index okay um then we're just going to initialize our storage context so storage context from default just like before but remember that above here uh when we did this let me just show you when we created the index right here um where was it um uh storage context from default we just loaded the persistent directory like that and in this case we're loading the vector store like that okay so quick uh difference right here uh uh but um since this one is already linked to this persistent path you don't have to pass in the persistent path right here okay and there you go and then you just uh initialize your index Vector store index you pass in your storage context and your noes which are the ones that I had previously identified and then I'm just going to initialize the query engine because index um always takes as engine always has this as engine method which which returns um query engine that you can just query uh so I'm just going to execute this probably going to take a little bit um have an error right here um oh yeah probably going to have to add the the hugging face embedding model like that so there we go uh probably going to take a little bit of time because it's going to have to re-embed everything uh using my local em settings from huging face and then I'm going to be able to just initialize my query engine um for reference you can also do this uh with documents so in this case I'm running I'm running it directly on the notes if you want it uh to run directly on the documents that you get from just your simple directory reader you can use Vector store index like this and instead of just calling it like that you can just do from document and you're pass in the documents right here and if you want to do the Transformations like we did before you can just pass them right here um uh on the spot with the Transformations parameters right here and here you can do what we did before like uh your splitter your Tex your title extractor your Q&A extractor Etc okay uh but that is just in case you want to uh load the documents directly and uh now I can just uh query this one again using the vector database this time and this model specifically Health you ptl1 14 excels at Medical visual answering tasks um achieving optimal or near optimal results across all subtasks with an average score of 74 okay so there you go now my answers are being retrieved from my chrom ADB uh Vector store which I have right here and it's also persistent so I can just turn off this uh notebook or this server or restart the application if I'm running this in in a back end and uh just load the Chrome Vector store that I have in a persistent um volume and uh just be able to run it directly so there you go that is essentially how you do a little bit more advanced implementations of um rag using L index and also do keep in mind that the things that I just showed you are not specific to L index or to any uh pretty much any uh llm framework that you may find um the more uh useful thing that I showed you right now are essentially the Transformations that we uh did essentially the metadata augmentation and uh choosing which metadata is actually being sent to your language model and to your embedding model and those are things that you can can absolutely do in any uh framework uh be it be it l be it Lang graph be it Lang chain and um the idea right here is that these are just general techniques that help you improve the quality of your rag pipeline uh for any application you're developing um Lama index does make it uh quite easy because they have these modules already pre-built but uh the idea right here is that you have this um possibil this techniques uh that you can Implement in pretty much any any uh application you have so I hope this was useful for you let me know if you have any questions and uh yeah remember if you want to learn more about this I host a cohort um where we uh go over all of these topics uh from zero to to expert AI engineer um it's a 12we cohort and I'm there to answer all your questions and to be sure that everything is clear so there you go thank you very much for watching and I'll see you next time [Music] [Music]

Original Description

Learn advanced Retrieval-Augmented Generation (RAG), focusing on techniques such as metadata augmentation and metadata filtering using LlamaIndex. This will help you improve your RAG models for better performance and accuracy. ## Links --- 🚀 AI Engineering Bootcamp (open now): http://aibootcamp.dev/ - Notebook: https://colab.research.google.com/gist/alejandro-ao/75d18a00f0fbe6bacb57083c210178a1/llamaindex-vector-stores.ipynb - Buy me a coffee (or a beer): https://buymeacoffee.com/alejandro.ao ## What You'll Learn --- - Advanced RAG Techniques: Understand the intricacies of Retrieval-Augmented Generation and how it can boost your AI models. - Metadata Augmentation: Learn how to enrich your data with metadata to improve retrieval and generation processes. Also, control what metadata is sent to your LLM and embedding model. - Hands-On Notebook: Follow along as we explore a detailed Jupyter notebook, showcasing practical applications of these techniques. - LlamaIndex Integration: See how LlamaIndex can be integrated into your workflow to streamline RAG processes. ## Timestamps --- 0:00 Intro 2:21 Quick Explanation of RAG 6:21 Setup 7:07 Extract text from File 12:26 Metadata Selection 22:40 Metadata Extraction 35:46 Create An Index 43:55 Use a Vector Store 50:15 Conclusion

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Alejandro AO · Alejandro AO · 45 of 60

← Previous Next →

Linear Regression in R - Full Project for Beginners

Linear Regression in R - Full Project for Beginners

Configure Webpack 5 in Wordpress (2025) with Typescript and SASS

Configure Webpack 5 in Wordpress (2025) with Typescript and SASS

R Programming 101 - Crash Course for beginners

R Programming 101 - Crash Course for beginners

Convert HTML template to WordPress Theme (2025) - Full Course

Convert HTML template to WordPress Theme (2025) - Full Course

Javascript Interactive Map with Leaflet EASY (with Marker Clusters & Popups)

Javascript Interactive Map with Leaflet EASY (with Marker Clusters & Popups)

Vanilla JS Project: Multi Step form in HTML, CSS & OOP Javascript

Vanilla JS Project: Multi Step form in HTML, CSS & OOP Javascript

How to do AJAX in WordPress correctly (2025)

How to do AJAX in WordPress correctly (2025)

React Leaflet Tutorial for Beginners (2025)

React Leaflet Tutorial for Beginners (2025)

Linear Regression in Python - Full Project for Beginners

Linear Regression in Python - Full Project for Beginners

Logistic Regression Project: Cancer Prediction with Python

Logistic Regression Project: Cancer Prediction with Python

Display Equations in ChatGPT

Display Equations in ChatGPT

Create a Chrome Extension (Manifest V3) for ChatGPT

Create a Chrome Extension (Manifest V3) for ChatGPT

Full-Stack Project | ChatGPT API, React, Node.js, Express

Full-Stack Project | ChatGPT API, React, Node.js, Express

Streamlit Python Course: Build a Machine Learning App to Predict Cancer

Streamlit Python Course: Build a Machine Learning App to Predict Cancer

Langchain PDF App (GUI) | Create a ChatGPT For Your PDF in Python

Langchain PDF App (GUI) | Create a ChatGPT For Your PDF in Python

LangChain Memory Tutorial | Building a ChatGPT Clone in Python

LangChain Memory Tutorial | Building a ChatGPT Clone in Python

Chat with a CSV | LangChain Agents Tutorial (Beginners)

Chat with a CSV | LangChain Agents Tutorial (Beginners)

Create a ChatGPT clone using Streamlit and LangChain

Create a ChatGPT clone using Streamlit and LangChain

Chat with Multiple PDFs | LangChain App Tutorial in Python (Free LLMs and Embeddings)

Chat with Multiple PDFs | LangChain App Tutorial in Python (Free LLMs and Embeddings)

Full Python Environment Setup for AI (or other) Apps + Virtual Environments

Full Python Environment Setup for AI (or other) Apps + Virtual Environments

Langchain + Qdrant Cloud | Pinecone FREE Alternative (20GB) | Tutorial

Langchain + Qdrant Cloud | Pinecone FREE Alternative (20GB) | Tutorial

LangChain Version 0.1 Explained | New Features & Changes

LangChain Version 0.1 Explained | New Features & Changes

Create a RAG Chain using LangChain 0.1 (New version)

Create a RAG Chain using LangChain 0.1 (New version)

Tutorial | Chat with any Website using Python and Langchain (LATEST VERSION)

Tutorial | Chat with any Website using Python and Langchain (LATEST VERSION)

Deploy Your AI Streamlit App for FREE | Step-by-Step (Heroku Alternative)

Deploy Your AI Streamlit App for FREE | Step-by-Step (Heroku Alternative)

What is Google's Gemini 1.5 Pro | 10 Million Token Window

What is Google's Gemini 1.5 Pro | 10 Million Token Window

Chat with MySQL Database with Python | LangChain Tutorial

Chat with MySQL Database with Python | LangChain Tutorial

Stream LLMs with LangChain + Streamlit | Tutorial

Stream LLMs with LangChain + Streamlit | Tutorial

Chat with MySQL Database using GPT-4 and Mistral AI | Python GUI App

Chat with MySQL Database using GPT-4 and Mistral AI | Python GUI App

#1 Harrison Chase: LangChain and The Future of LLM Applications | Alejandro AO

#1 Harrison Chase: LangChain and The Future of LLM Applications | Alejandro AO

CrewAI Step-by-Step | Complete Course for Beginners

CrewAI Step-by-Step | Complete Course for Beginners

Python: Automating a Marketing Team with AI Agents | Planning and Implementing CrewAI

Python: Automating a Marketing Team with AI Agents | Planning and Implementing CrewAI

Build a Web App (GUI) for your CrewAI Automation (Easy with Python)

Build a Web App (GUI) for your CrewAI Automation (Easy with Python)

Early days of RAG and LlamaIndex - Jerry Liu

Early days of RAG and LlamaIndex - Jerry Liu

LlamaParse: Convert PDF (with tables) to Markdown

LlamaParse: Convert PDF (with tables) to Markdown

#2 Jerry Liu - What is LlamaIndex, Agents & Advice for AI Engineers

#2 Jerry Liu - What is LlamaIndex, Agents & Advice for AI Engineers

CrewAI + Exa: Generate a Newsletter with Research Agents (Part 1)

CrewAI + Exa: Generate a Newsletter with Research Agents (Part 1)

#3 Joe Moura | Multi Agent Systems and CrewAI

#3 Joe Moura | Multi Agent Systems and CrewAI

Python: Create a ReAct Agent from Scratch

Python: Create a ReAct Agent from Scratch

New Groq Models: Best for Function-Calling Agents

New Groq Models: Best for Function-Calling Agents

Introduction to LlamaIndex with Python (2025)

Introduction to LlamaIndex with Python (2025)

LlamaIndex: How to use LLMs

LlamaIndex: How to use LLMs

LlamaIndex: How to Get Structured Data from LLMs

LlamaIndex: How to Get Structured Data from LLMs

Multimodal RAG: Chat with PDFs (Images & Tables) [2025]

Multimodal RAG: Chat with PDFs (Images & Tables) [2025]

Advanced RAG with LlamaIndex - Metadata Extraction [2025]

Advanced RAG with LlamaIndex - Metadata Extraction [2025]

Learn MCP Servers with Python (EASY)

Learn MCP Servers with Python (EASY)

Create MCP Clients in JavaScript - Tutorial

Create MCP Clients in JavaScript - Tutorial

Create an MCP Client in Python - FastAPI Tutorial

Create an MCP Client in Python - FastAPI Tutorial

How to Build an MCP Client GUI with Streamlit and FastAPI

How to Build an MCP Client GUI with Streamlit and FastAPI

Vibe Coding For Engineers (make it ACTUALLY work)

Vibe Coding For Engineers (make it ACTUALLY work)

LlamaExtract Tutorial: Convert PDF & Images into JSON

LlamaExtract Tutorial: Convert PDF & Images into JSON

Local MCP Servers for Cursor (Step by step)

Local MCP Servers for Cursor (Step by step)

Anthropic: How to Build Multi Agent Systems

Anthropic: How to Build Multi Agent Systems

Deploy Remote MCP Servers in Python (Step by Step)

Deploy Remote MCP Servers in Python (Step by Step)

GPT-5 for Developers: API Changes, Pricing, Model Router & Security

GPT-5 for Developers: API Changes, Pricing, Model Router & Security

Tutorial: Auth for Remote MCP Servers (Step by Step) | OAuth 2.1 with ScaleKit

Tutorial: Auth for Remote MCP Servers (Step by Step) | OAuth 2.1 with ScaleKit

Generate UI Tests with TestSprite MCP Server + TRAE

Generate UI Tests with TestSprite MCP Server + TRAE

#4 Allan Guo | 19-yo YC Founder - Willow Voice

#4 Allan Guo | 19-yo YC Founder - Willow Voice

RAG Project: Build an AI Onboarding Chatbot with Streamlit, LangChain, and ChromaDB

RAG Project: Build an AI Onboarding Chatbot with Streamlit, LangChain, and ChromaDB

MCP Security | Malicious MCP Servers (Protect Yourself)

MCP Security | Malicious MCP Servers (Protect Yourself)

This video teaches advanced RAG techniques using LlamaIndex and vector stores, with a focus on metadata augmentation and filtering. It provides a hands-on example to improve RAG model performance and accuracy. By following this lesson, you'll learn how to integrate LlamaIndex into your workflow and optimize your LLM models.

Key Takeaways

Set up a RAG model
Extract text from files
Select and extract metadata
Create an index using LlamaIndex
Use a vector store for efficient retrieval
Integrate LlamaIndex into your workflow

💡 Metadata augmentation and filtering are crucial for improving RAG model performance and accuracy, and LlamaIndex can be used to streamline these processes.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Shane | LLM Implementation

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Related Reads

RAG Is Not a Feature. It's a System, and These Are the Parts Nobody Demos.

Learn how RAG is a system, not a feature, and understand its key components beyond demos

Dev.to · Marketing wizr

What Is RAG? The AI Technology That Makes ChatGPT Smarter Without Retraining

Learn about RAG, the AI technology that enhances ChatGPT's capabilities without requiring retraining, and why it matters for advancing language models

Understanding the Limits of Linear RAG — and Why Agentic Workflows Are Catching On

Learn the limitations of linear RAG pipelines and how agentic workflows are becoming a popular alternative for more efficient and effective AI workflows

Understanding the Limits of Linear RAG — and Why Agentic Workflows Are Catching On

Learn why linear RAG pipelines have limitations and how Agentic workflows are becoming a preferred alternative in the industry

Medium · Machine Learning

Chapters (9)

Intro

2:21 Quick Explanation of RAG

6:21 Setup

7:07 Extract text from File

12:26 Metadata Selection

22:40 Metadata Extraction

35:46 Create An Index

43:55 Use a Vector Store

50:15 Conclusion

RRF vs DBSF with Qdrant: Hybrid Retrieval Fusion for RAG in Python

Professor Py: AI Engineering