LLama 2 + PEFT Docs: CODE interactive LLM w/ RAG

Discover AI · Beginner ·🧠 Large Language Models ·2y ago

Skills: LLM Foundations85%Prompt Craft80%RAG Basics80%Advanced Prompting70%Vector Stores70%

Key Takeaways

The video demonstrates how to integrate new, external knowledge from webpages to a LLM (LLama 2 70B) using Retrieval Augmented Generation (RAG) to improve the accuracy of answers given by the AI, with tools such as LLaMA 2, PEFT Docs, Gradio, Streamlit, and Hugging Face.

Full Transcript

hello Community today we got a code yes a code session and I asked myself hey what about coding so on Lang chain 21 hours ago the most popular templates retrieval augmented generation chatbot and local reval augment generation bot so great let's focus on this you know there's a new version of gbd4 coming up you can upload your complete P PDF files here and extract the information without any plugins we get a new cut update we got a new multim mode so taken into consideration all of this information I ask you question and you answered and you said hey you're interested in some professional code implementation how does it work with auto code how does it work with co-pilot or with tb4 code interpreter next was you told me hey a lot of examples you have to pay for the openi key for the coher key or you have to pay for different vacor stores to get access there that's very expensive and sometimes you have some external no code Library where you have no idea what this thing is doing and the question was can we get rid of all of this and code it ourself and the answer is of course another question was can we have a complete coding tutorial and it means really from 0 to 100% And also the question with integrate this here some interactive user interface now you know we have here gradio or streamlet especially if we deal with llm I will go with CIO and final question from the side of my viewers was hey can we also have a final up and running app so that we can really see what's going to happen so that we can play around with the code we can modify the app according to our needs and of course we can do this so so here we are now and today we will Implement all of this and we will build a domain specific rack based llm for a Q&A session for a Q&A chatbot and this is the first video so this will be the easiest video our introduction video to code now before we jump into the code let's talk about what we going to achieve what is our structure I have a query and I input this query to to my llm and I expect a response can be a specific response in a text or in a coding exercise so here we choose a llama 2 model we have a model has been trained but is missing the actual information that the system has no information how to answer my query so what we do we go here to the main source of all information today we go here to the internet and we will find here our Pages where the information is in real time and we will use here rag to parse the HTML contact extract here in the first step the textual data we Chuck the text into smaller pieces based for example in the simplest version of the character count on the overlap and of course then we embed here all this information in a new mathematical space in a vector space and we will use here the sentence transformer for this mathematical mapping from the semantic space to a mathematical space where Vector represent the semantic similarity of our sentences just to give you an idea let's say we take now or we borrow here the content from 1,000 internet Pages we have about 100,000 sentences I only have one input query so we have to find you the answer to this my query in those 100,000 sentences we use here cosine similarity in a vector space to get the top 10 answers and within those top 10 answers we will rank this and to the best three answer and in the end we feed back here to our llm a very specific rack optimized generative eii prompt and in context learning prompt so we have the latest and best and most most useful information that the system feeds back in the llm in the llm will generate here a beautiful answer okay so yeah we will create an index we will rerank the results we will use you also something from the sentence Transformer which is called a cross encoder and I will show you how to create an ICL in context learning optimized prompt that is rack based this with feedback and this we will get here then the result I was asked no Lang chain no llama index no Vector store can you show us here a simple code exercise without any external helper function we want to code this and here we are we are going to code this of course Jerry on top we will use here an interactive user interface we will use gradio here so you have this up and running you can send the HTML link to your friends and they can try it outs now you know me so what do you think about a little challenge let's let's do something new now you have access to Jet gbt for free so I was asked hey can this be our code generator yes of course we want to have fun of course and some insights into the flow of coding how we structure this we will not work here on a collab or on another Jupiter notebook but hey I discovered there are something beautiful like hugging face spaces that we can generate we can program we can apply here from a Docker to a gradio we will work today with gradio there's the new version four you can publish your result as an interactive demo and in the result of this video we have built a new interactive AI system where our external knowledge here is the documentation the official hugging face documentation of the parameter efficient fine-tuning so we build ourself a coding Ai and you might ask why well in the next video we're going to use this coding AI for the next step remember this is the simple version just to give you a feeling we will follow here our hugging face Wizard and surup this is here the expect knowledge we will access today and if you go to for example GitHub you see under Pac-Man 100 we have here all his GitHub repost and I'll leave you the link in the description of this video so you have access to the files we're going to use and you can download it here either from hugging face or from GitHub the first step we always start with the data so from our internet we have to generate the data for our specific query then we do a little bit of rag engineering with an i prompt optimization then we have here a run of our llama 2 model with a rag we have an inference and we build here the interactive shell around this an interactive gradio user interface those are our four steps we're going to do today in this video so let's start with number one we borrow from the internet some real time content from specific web pages and remember this is the easy solution so we now go here and we tell the system hey please go to hugging face to this web page to this manual where path is explain laed where we have the complete a or API description and load here all those information from the official hugging face Pages explaining what PFT is so at first we are a single domain but of course you can have multiple web pages and go multi- domain and here we go I built here on a jet GPT conversation from a hugging face wizard I leave you the link in the description of this video and we just continue and follow his path so he said to jbd hey how do I Scrapy web pages can I use Scrappy and jbd comes back and says hey this is a powerful and flexible python web scrapping framework here is a stepbystep guide how to use crappy so you install it with a simple pip install command you create a new project you say hey my project for example you define the spider spider is a class that defines how to scrape a website in this example yes we go to this one and you see and now jbd creates the code you have here the specific class here of spider we have here our orl we have here the parse command then you run the spider and it's time to run the spider navigate to the yes yes yes and you have here simple command so that's it you have successfully used Scrappy to scrap data from a website and then user comes back and says hey update the above code to use Beautiful soap for the parsing Tob comes back say of course integrate beautiful uh soup here into a paing instead of Scrapy buildin CSS lectors so at first we have of course pip install beautiful soup and then we simply update here the code and we have here where's our beautiful soup response here our HTML parser and it explains what it did and says great beautiful now the human user says Hey I want the parser to ignore any multimedia data on this internet Pages such as images videos or audio so it's just interested here in a textual description and a textual code and save the content converted from HTML to text into a file whose name is simply the hash of the URL being po output the code for achieving this and now chat GPD comes back and does exactly this save the content text content to a file and go to the next page link beautiful so explains what did the file name will be something like beautiful. txt where beautiful is simply a specific hash of the URL so then human says hey can you write the code to scrap here now from hugging phase here from the documentation here a specific parameter efficient fine-tuning methodology using the steps you outlined above iterate over all the sub links also in the file and then chat gbd comes back say great so I just adapt now here and I build now for a specific internet address here my hugging face spider for example so calculates the hash save the context to a file follows all sub link beautiful so you see this is done by chat GPT beautiful explains what it did great and now the user says Hey for sublinks check that they really from here the documentation Pages comes back says okay let's do this update the code beautiful updated code is valid Su link okay you check for this tells you what to do now next instruction for the human so you see we build up step by step level by level write the file to an output folder let's call it output make sure to create the folder if not already present have this in the init logic op of the Scrapper so here we go this is done yes yes yes and next command is ADD logic to not parse any multimedia such as images videos or audio we do not go here in the first step to multi modal we just want to have here extraction of our text say okay you can check here the MIM type of the respond content before extracting and saving to the file and if you have have a look at this where are we and as you can see here we Define here is multimedia content and we Define here one of those image videos or audio classifications and then we say here in the pars command hey for the logger skipping multimedia content for this specific URL if given that this is one of those three definitions beautiful filter only the text from here specific elements with specific values then comes back says okay let's do this where are we here we go filter only the text here from the div elements with the specified class group or sup select and the code is provided to us by chat jbt updated code soup select CSS select yes find all the elements with the class attribute Direct rest of the code Remains the Same beautiful in a further definition here of this function now write a function to chunk text based on two parameters now normally we use here the sentence length but if you have extraordinary long sentences you need some other idea some simple idea so you can just Define here the chunk length so the maximum length of each chunk based on the number of characters and the overlap so the overlap lab of text between the consecutive junks and chap GPT writes here now with the two additional parameter chunk length and chunk overlap into our program beautiful and we have here the example chunk length for here in junk overlap 10 go with whatever is suited for your particular text explains here we iterate with the text starting from the beginning and incrementing by the chunk length minus chunk overlap in each iteration we extract the chunk of the text from the starting index to the ending index then we append the chunk to the chunks list great now the next further specification is hey use the B function to chunk all the files in a given specific folder and create now a panda data frame Panda data frame is something beautiful with the following three Fields the ID the content and the file name from which the chunk came and of course create your Panda data frame yes beautiful and you see the code is generated process file here and then we have here Panda data frames great and you can for example display here an example of this pan data frame additional command create a script for the above code taking in three arguments chunk length chunk overlap and one is missing the folder puff now okay says okay let's create a script that takes here the chunk overlap the length and the folder PFF in will process all the files in the specified folder chunk their content using the provided parameters and create a Pano data frame with the required Fields ID content and file name and then here jat GPT writes the code for us and it tells us hey to run this script I mean can't be easier just open your terminal or your command prompt and execute the following command and this is it with your specific parameter with your desired values okay and then we say okay please here use some specific for handling here the script arguments save now the output data frame here to an optimized file format and we go here with a par key file and whose PA is a script argument and so we just convert your p to par this is done beautiful overlap yes yes yes and then we have here P the data frame and then we simply say data frame to par key file everything is included this is an easy task and we have now a par key file and this is exactly where we then can start off since we now have the data in a file format that is optimized and the right way to go with a Paka J gbd comes back and says hey you just have to run this just type in script name. pi and your parameter and you're ready to go so now we have all the data the textual data at first in a panda frame and now we have a p file beautiful and optimized file format and now comes the part where we will use now here to get here or make this transformation here of our let's say the chunks are sentences of the text and now we transform the sentences to a vector representation in a vector space we call this the embedding function and we use here a sentence Transformer and an S bird to get the embeddings of a sentence so you see I don't need any vect to store any chain at all we just use here the good old expert multimodality so at first we install here our sentence Transformer library and I have more than 50 videos on my Channel about espert so please have a look if you have any question about how to apply sentence transform and optimize them very easy at first we say hey let's do something like get sentence embeddings so we just call here from hugging phase a specific trans sentence Transformer model that is available able on hugging face and if you have a look they for example go here with a bird model with a smaller Bird model it's called dis still bird base model with the tokens and then we just say okay now encode the sentences to get their Vector embedding representation so we a model do encode the sentences and return now the embeddings so what it does it does exactly here it takes in a textual sentence and returned is a numerical representation in a vector space where we can operate with a coign similarity to find semantic similar sentences beautiful this is the code there are about I don't know 50 to 100 different bird and eser models please go to esper.net and choose the best model for your specific task yes explains what it do great yes input file and output file so we say now okay now create a script that takes in the following arguments we have an input file where we have the text chunks created just one minute ago as I showed you here in our par key file file with the par key extension and then we want now an output file having here the B extension to save the output search index so what we do read the input paret file using the pandas you remember we have three columns the ID the content and the file name then use here the specified sentence Transformer function from above to embed each sample and create an index mapping from unique numerical ID to embedding using here a specific index Library functionality and save the created index to the output file great CH comes back says hello I'm ready to do this so we use this we read in the input from the P file we use our eser to embed here all text chunks now to a vector embedding and I will create you the code that a search index is created you see you don't need any Vector store or any anything else you do not have to pay for them it is so easy to do it yourself and this is the beauty because although we pretend we do not know how to code chat GPT is giving us here the complete code structure and later on I will show you we can use GPT 4 to even improve the code further but this is the first video here of this video series so we go with the basic model sentence Transformer word base great so here we have now a function to create here this specific index and this is here a library maybe I have here an interrupt and I show you this specific library and then we have our main function input file output file reading the PAR file get all the embeddings from the text chunks create now the index that we have save the created index to a new output file and this is it we are ready to go and here tgpd gives us the command hey just run here create here the index and this is it so we have now a script we'll read the text junks from the P file embed them using any sentence Transformer of your likes create an index and save the index to a file so this two files we need now in the further computation so more or less we can say hey this is great because now we have the data preparation and the cleaning data phase ETL more or less done here is another command by the human user says Hey load to save search index and write two function for creating embeddings of a given query because you remember we have now just the text converted to a m ma matical Vector representation but I started with my query to my llm and now to have a cosine similarity now from my query to all the text passages we have to have now also that my query that I have to my llm will be also transformed in a vector representation so using the search index and query the embeddings and now find the let's say 10 nearest Neighbors in the vector space so if I have mcar now in a vector I look for the nearest Vector representation here or the 15 nearest Vector representation and this is very easily done in a vector space TD comes back yes of course let's do this so you notice we create the index we load the index we have here our very embedding this is happening with the command you know model. end code here the query we return now the embeding and then we have SIMPLE function here find the nearest Vector neighbors given my quy so find the best 10 answers in the very close by Vector space so find the nearest neighbor for very embedding and we have here just we we utilize here a python Library great so again the main file here reading our pet create embeddings create the index load the index create an embedding for my specific query using a sentence Transformer of course the same model that we used to convert here our internet text content to a vector embedding find the nearest neighbors with this specific definition and then you print out the result and you say here even some chunk content the distance and you can give even have here a confidence level or a close by Metric relation but it will give you here let's say the best Five results and this is now the best Five results answered from external data that we can feed in now in an incom context learning and this will be an additional port to our prompt to our llm so we have achieved a complete rack process in this simple line of codes we did not have to know anything at all about python because jet gbt did everything for us it explains this great yeah in addition you can have now an ID to text the dictionary and use it to get a list of the retrieve text junks and put a logic in a function so let's do this and of course you know perform similarity search yes you notice now for five neighbors so we have our back key file we have our embeddings here we create the index yes yes L you notice now create your now ID to a specific dictionary and beautiful so with this latest version of the script we have added the perform similarity search function so this Vector operation in a mathematical space that is so much faster and it is beautiful to find semantic similar content which performs the entire similarity search process it returns then here specific dictionaries where the keys are the chunk IDs and the values are the retrieved five or top 10 text junk chunks if no query is provided okay yeah use the return to print to result and when you run this now with your specific parameters and you define the five or the 10 nearest results this is done now empirical data have shown us that this cosine similarity in a VOR space is not the optimal way we can do this with an expert system because to our buy encoder expert we also have a cross encoder expert I have a specific playlist with five or 10 videos how to use cross encoders on my channel on how to optimize cross encoders now we just say hey we need to improve here the ranking of our return results from the external data and to get a better and improved ranking we use now an expert cross encoder to rerank here our results yes yes yes so in the sentence Transformer Library tells me CH gbt we have the cross encoda class yes JD had a look at my videos from about one year ago the cross encoder model takes pairs of text segments and provides here now a single evalu ation score indicating the relevance of the second segment to the first one there are specific training data set that provides here our cross encoder with this functionality with this precision and remember if you have a specific domain knowledge like I work in physics and Mathematics I trained my own cross encoder on my specific data structure on the specific physical chemical biological names specific mathematical terms to have the optimal vocabulary to have the optimal encoding structure so I have here domain specific cross encoder if you want have a look at one of my videos so we have the result to rank now to retrieve junks by our cosine similarity with a better and more performant algorithm so again cross encoder are with our sort our sentence Transformer and we update now here simply the similarity search function let's go there so here we have our rerank the junks Swift now cross encoder so create a list of TOS each containing here this query chunk pair get the scores for each query chunk pair using the cross encoder so if we have now the top 10 query chunk pairs we get now if you want an evaluation an additional evaluation on the quality and a ranking of the top 10 uh answers so you sort the chunks based on the score now of the Cross encoder in a decending order beautiful yes yes yes index query yeah you know this now and then you have the result so in this again updated perform similarity function we use now to cross encoder class from the sentence Transformers from espert to load here a pre-trained cross encoder model I hope it is domain specific and you trained it on your specific domain so you can f tune here also the cross encoder and of course you can fine tune the sentence Transformer so you see you have multiple approaches for fine-tuning for optimizing your system but more about this in a later video then you use simp the re rank junks with the cross encoder function to R rank the top 10 results and now you have not top one top two top three but you have maybe top two top four top six here as the leading and best evaluated retrieved Trunks from our external data the script will rerank the retriev trunks using the cross encoder model beautiful so here is now where he here the official hugging face documentation that created this code ends here I will leave you the link of description of this video and then I just uploaded it here to my account and as you can see we here default GPT 3.5 our chat GPT I uploaded it and I just I was interested if I can continue and I said hey please summarize and explain the complete conversation in this session and I was hoping that bringing this conversation in it's more than a month ago is now here chat GPT able to have here a summarization of all of this what I just show you and it comes back and says certainly in this conversation the user started by asking yes you know we have been through this and then yes yes yes code exam explanation step-by-step instruction was provided request including web scrapping text chunking data frame creation the search index creation similarity search with our expert and then reranking with our cross encoder model so you see if you have here those chats available even here our hugging face wizards they show you unbelievable here if you do not know how to code or you encounter a code coding problem hey Jet gbt is able here in this case for example to do here the perfect data preparation and now we have here different files we have created an index we have created all our embeddings and we even have a cross encoder isn't that fantastic and we're going to use this now because now we're going to enter here the rack coding and we will now bring here our llm and our rack based new external data together to form here an optimized in context learning prompt engineered structure so part one data preparation check finished next part let's code now this exercise in a new space and I really mean in hugging face I logged into hugging face said cre create a new space for me so and I say here this is my llm rack license here and I go here with gradio so as you can see my Hardware requirements I just have a a CPU basic if you put in your credit card you can go for whatever you like and you see here the prices so you can go even with a 184 GB vram so there is enough power for any application that you would need I will go here with private because I show you here the way forward and then the final um space I give you also the LinkedIn for the final Space by our hugging face wizard so this is it this is all we have to do let's create the space and so get started with your gradio space yes beautiful but I I want to do this here very slowly very precisely so I go here now hugging phase and I create now near on hugging face space here my application my python file commit directly to the branch yes beautiful and now what I'm going to do I'm going to insert here the complete code and you will recognize this code so here we go edit and let's just insert have the code and let's start to understand what is happening here oh yeah we have now of course to upload here now our input data the two files that we created so how we do this easy we go here to files and as you can see here I just have to read me and here just the application but now I want to add here the files that we created with the different with the other code and let's just say upload Lo the files as you can see I have here my Pary file uploaded and of course here my eddings say commit directly to the main branch commit changes yes let's do this and now if you take some time now we have here our PAR file here with the chunks and we have our embeddings and yeah maybe we should do here for the dependencies a requirement txt let's do this create a new file you can do this here so let's see what we need what we need on dependencies we need our sentence Transformer esper.net we need here our index generator we go have to go to the hub we need scipi we numai we need the panders we need of course everything about hugging face Transformers and we will use Something Beautiful by Phil Schmidt it is called called easy llm and where we can switch here between for example open Ai embeddings and bird embeddings this is something I show you in the code so let's say this is it this is our com say that's it so if we go now here we have our requirements our embeddings our par key file with the data yes you can write a beautiful read me file and here we go now to our application now we go here in edit and let's have a look what we have well as I told you we have our sandance Transformers where we have our buy encoders our expert system and our cross encoder then we have our index generator NPI then we have here as a user interface I will use gradio if you want streamly there also another beautiful opportunities we go with our work our horse of pandas and torch great as I told you our easy llm here our helper is there and from the hugging face Transformer we have the tokenizer so prom Builder is llama yes then you have here your specific hugging face token remember you have to insert your right token here then you define the max new tokens the default Max new tokens and and and and embedding Dimension search index file the embeddings we just uploaded the PAR file and the cosign threshold for for calculating it a coine similarity in your vector space then if you have here an Nvidia um GPU you take care about Cuda or you say just CPU we will run only on CPU and then we go here with our models as I told you our llm here is a l L 270 billion jet hugging phase optimized model great and but please remember for this you have to go and get here the permission by llama by by Mei for the Llama to weights and everything so it's not really an open llm but never mind and then here for our rack we have here our buy encoder from sentence Transformer we use this specific model and for our cross encoder we go here with our mini language model that has been optimized on the MS Maru data set beautiful our tokenizer of course from the pre-trained model ID from our llama 2 and this is it now we have defined our three models we're going to use and now we just have to create here the interplay between this model and of course we need to create now a question and answer prompt yes yes yes create some condense question prompt so given the following conversation and a follow-up question rephrase the follow-up question this is just here some if you have the chat history some optimization yes then get the prompt beautiful you notice get input token length here we have now from field Schmid as I told you here a specific code completion as you see you put in your temperature your top k your top P everything else you can go then you load here the index where we provided here the index files already and then create the index for the PF docks from the numine Bings and avoid the arching mistakes when creating the search index you notice this we have done already on my videos and you simply create here the very embedding as I just showed you here with our buy encoder and you find here again here the same code then next nearest top 10 neighbors this is it then you use here the cross encoder to have here the fine tuning of our top 10 ranking and you get here the optimized fine tuned methodology and then finally we can have here our J generate here the condense query and yeah you notice the interplay between the human and the assistant and the system prompt here is for example please insert what you like you are helpful respectful and honest assistant please answer always helpfully imposs while being safe your answer should not include any harmful unethical you notice this is clear description what we call it we call it a path based documentation question and answer chatbase based here on a llama 270b model remember for llama 2 there are specific uh legal requirements yes yes yes if no torch is available just say hey I'm running here on a CPU and then we have here clear the text box and because we are already working here on the gradi user interface you notice me system iterator yes qu embedding now everything comes together beautiful and you define your output process example check input token length if it is too long gives you a warning beautiful then what we have here now we load here our two files you know we have here now the search index and the second is here the read the par key file here and a Pano data frame beautiful and this is it the rest now is here absolutely just the gradio commands with the gradio blocks in a specific style we have markdown we have the group we have our row you see we have here the retry btom the undo bottom the clear the input btom and then we have here an on where we have our Advanced option where we can say okay the maximum new tokens or you define here on a slider your temperature you define here the top P or the top K parameter of course you can provide here for the demo some examples so the user just clicks on it and gets this as the input string yes yes yes success generate the output button event pre-process yes this is gradio only retry to click yes beautiful this is just here for our gradio user interface and this is it this is it and you might say so and now now you just go here and runtime error let's have a look at this at the app and you get here yeah of course if you click now on the app you will see it if you input your hugging face token so if I insert my hugging face token here we are now we are building as you can see in real time take some time okay let's see okay so everything is loading beautiful this looks interesting yes we are moving along so yes downloading everything that we need beautiful okay okay there is no problem the above exception okay runtime error great what is it now we look here you're trying to access a gated Reaper llama 2 do I have permission to access the Llama 2 model on hugging face are you joking is still is this really a thing in November 2023 wait a second wait a second okay here yes look cannot access gated repo for llama 2 you request to access the Llama 2 model I still have to yeah I have created a new account on hugging face for this video and my email is now I must request from meta a new permission that I'm allowed to use Lama to so let us go to the official PF docs Q&A jetbot print a Laura config demo and here you have it so you see this is exactly what you get you get an explanation from the manual here from the API and you have here also some code droplets it explains how you can configure your PA mod you can say hey print a code for the Laura config then you get a short explanation what this is all about takes a base model A configuration object with an input and returns a trainable PFT model and then you get here as you can see a simple python code so you do as always you have your Transformer you have your P your setup the Laur configuration yes this is exactly here the following parameters that I already showed you and then you have here the base model we take your T5 base model model and then you wrap the base model with the get PA model function to create a PA model beautiful gives you also some additional further information you can customize here the lower config object to adjust the scaling factor to drop a probability of the lower layers or replace it with a different P method if it is desired for the user interface just to be specific here you see here we have the Advanced option as you already saw the system prompt you can put here your personal system prompt whatever you want the system to behave the maximum number of tokens the temperature of the system the top p and the top K parameter given within a specific range you have here some examples how do you use lower with custom models you see it's in there just say submit and you get an answer so we have here more or less e Compu complete here gadio user interface and you see how can I use it you instantiate a base mod create a lurer config wrap the base model with get PA model to get trainable model and train the PA model as you normally would train this so you have here one two three and four steps exactly how to do this and you can ask the system also question and explain its specific term so there you have it now as an Outlook to the next videos remember that we can optimize jet GPT code python code for example with a specific configuration that we will apply here for gbd4 so gbd4 will tell us how good is our code how we can optimize our code I would like to make a video about this in the UPC in the next upcoming videos of course as you said we can have here the data from our realtime internet data and instead of providing those data with your rack based API we can have if we have the data we can use the data to finetune here our llm and if we don't not have a extreme compute infrastructure available we will not go with fine tuning this is very expensive but we will will use here the parameter efficient fine-tuning for our llm with the new data and maybe I show you some parallelization optimization how we can do this so instead of using here some external data and just use here the method here of rack based we can of course here also fine tune and parameter efficiently fune our llm with the new data which has a big Advantage because now our llm has been trained on this data and is able to make some logic deduction maybe even some reasoning on the data while if you just focus here on a rack based API um generation here of an answer to my query we just feed here specific information to thei within our prompt but the system does not really inham curently learn this information so this is the outlook for the next videos I hope it was informative and it would be great to see you in my next video

Original Description

A complete code tutorial to integrate new, external knowledge from webpages to a LLM (LLama 2 70B) in order to improve the accuracy of answers given by the AI. By Retrieval Augmented Generation (RAG) to improve the response of any AI. RAG based information added in the form of an ICL prompt to LLama 2. RAG and LLM Integration: A Comprehensive Coding Guide for AI Enthusiasts. Building Intelligence: A Programmer's Guide to Retrieval-Augmented Generation in LLMs. LLama 2 + PEFT Manual: Code an interactive Gen AI w/ RAG Augment the basic knowledge of an AI /LLM with up-to-date date data and information from trusted sources of the internet. No LangChain, no LLamaIndex, no vector stores. Just pure code. We'll use ChatGPT to scrape the PEFT documentation using scrapy and BeautifulSoap, chunk it, embed the chunks in a new vector space using sentence-transformers, create index using hnswlib and loading the search index and utils for embedding user query. Similarity search is performed in this vector space to find the TOP10 best answers, given our body of external knowledge. Then we use a cross-encoder for the reranking of the semantic similarity and feed the best answers back into our LLM, to augment the ICL prompt to LLAMA 2. Code sources (all rights with original authors): https://huggingface.co/spaces/smangrul/PEFT-Docs-QA-Chatbot https://huggingface.co/spaces/smangrul/PEFT-Docs-QA-Chatbot/blob/main/app.py https://chat.openai.com/share/eb3079ba-1379-4b9a-b21c-839feb023309 #coding #ai #huggingface

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Discover AI · Discover AI · 0 of 60

← Previous Next →

Step Into the Unknown (by YouChat) - May 2023 be your best year yet

Step Into the Unknown (by YouChat) - May 2023 be your best year yet

Wishing you all an amazing 2023 filled with Love, Laughter, and Happiness!

Wishing you all an amazing 2023 filled with Love, Laughter, and Happiness!

Create a Smarter Future!

Create a Smarter Future!

The Art of Text to Vector Transformation: A Comprehensive Look at AI and NLP Transformers

The Art of Text to Vector Transformation: A Comprehensive Look at AI and NLP Transformers

Feature Vectors: The Key to Unlocking the Power of BERT and SBERT Transformer Models

Feature Vectors: The Key to Unlocking the Power of BERT and SBERT Transformer Models

Domain-Specific AI Models: How to Create Customized BERT and SBERT Models for Your Business

Domain-Specific AI Models: How to Create Customized BERT and SBERT Models for Your Business

Achieve Unimaginable Levels of Domain Knowledge through SBERT Extreme in 3D (SBERT 48)

Achieve Unimaginable Levels of Domain Knowledge through SBERT Extreme in 3D (SBERT 48)

Unlocking Scientific Domain Knowledge w/ BPE Tokenizer: An Amazing Journey! (SBERT 49)

Unlocking Scientific Domain Knowledge w/ BPE Tokenizer: An Amazing Journey! (SBERT 49)

SBERT Extreme 3D: Train a BERT Tokenizer on your (scientific) Domain Knowledge (SBERT 50)

SBERT Extreme 3D: Train a BERT Tokenizer on your (scientific) Domain Knowledge (SBERT 50)

Discover Vision Transformer (ViT) Tech in 2023

Discover Vision Transformer (ViT) Tech in 2023

Pre-Train BERT from scratch: Solution for Company Domain Knowledge Data | PyTorch (SBERT 51)

Pre-Train BERT from scratch: Solution for Company Domain Knowledge Data | PyTorch (SBERT 51)

Flan-T5-XL model on a free COLAB | A free LLM - that explains itself w/ reasoning /write essay | AI

Flan-T5-XL model on a free COLAB | A free LLM - that explains itself w/ reasoning /write essay | AI

BERT and GPT in Language Models like ChatGPT or BLOOM | EASY Tutorial on Large Language Models LLM

BERT and GPT in Language Models like ChatGPT or BLOOM | EASY Tutorial on Large Language Models LLM

Free Alternative to ChatGPT: Flan-T5-XL GUI (open-source) #shorts

Free Alternative to ChatGPT: Flan-T5-XL GUI (open-source) #shorts

From T5 to T5X: A Game-Changing Evolution with JAX & FLAX

From T5 to T5X: A Game-Changing Evolution with JAX & FLAX

How to start with ChatGPT? | Short Introduction to OpenAI API #shorts

How to start with ChatGPT? | Short Introduction to OpenAI API #shorts

The Future of Conversational AI? Google's PaLM w/ RLHF | LLM ChatGPT Competitor

The Future of Conversational AI? Google's PaLM w/ RLHF | LLM ChatGPT Competitor

Microsoft and ChatGPU

Microsoft and ChatGPU

From Zero to FLAN-T5 XL Model GUI with Gradio: A Step-by-Step Guide on Free COLAB Notebook PyTorch

From Zero to FLAN-T5 XL Model GUI with Gradio: A Step-by-Step Guide on Free COLAB Notebook PyTorch

Google's 2nd Answer to "BING ChatGPT": Sparrow | after BARD w/ LaMDA | 2nd Gen Conversational AI

Google's 2nd Answer to "BING ChatGPT": Sparrow | after BARD w/ LaMDA | 2nd Gen Conversational AI

TF2: Pre-Train BERT from scratch (a Transformer), fine-tune & run inference on text | KERAS NLP

TF2: Pre-Train BERT from scratch (a Transformer), fine-tune & run inference on text | KERAS NLP

3D Visualization for BERT: How to Pre-Train with a New Layer & Fine-Tune with Downstream Task Layer

3D Visualization for BERT: How to Pre-Train with a New Layer & Fine-Tune with Downstream Task Layer

FLAN-T5-XXL on NVIDIA A100 GPU w/ HF Inference Endpoints, let's explore 11b models!

FLAN-T5-XXL on NVIDIA A100 GPU w/ HF Inference Endpoints, let's explore 11b models!

ChatGPT - Can it Lie to you?

ChatGPT - Can it Lie to you?

ChatGPT Alternative: Perplexity by Perplexity.AI

ChatGPT Alternative: Perplexity by Perplexity.AI

2023 KerasNLP Tutorial: Explore Latest KERAS Toolbox & NLP Processing Library for BERT - TF2

2023 KerasNLP Tutorial: Explore Latest KERAS Toolbox & NLP Processing Library for BERT - TF2

Self-aware AI: You.com/chat vs Perplexity.ai | Live Demo, LLMs show Future of ChatGPT w/ BING

Self-aware AI: You.com/chat vs Perplexity.ai | Live Demo, LLMs show Future of ChatGPT w/ BING

BLOOM 176B Inference on AWS | Bigger than GPT-3 for more Power!

BLOOM 176B Inference on AWS | Bigger than GPT-3 for more Power!

Fine-tune ChatGPT? Buy Embeddings /OpenAI? What are Embeddings? My own ChatGPT? | Visual Q+A

Fine-tune ChatGPT? Buy Embeddings /OpenAI? What are Embeddings? My own ChatGPT? | Visual Q+A

Unleashing the Power of BLOOM 176B with AWS ml.p4de.24xlarge, DJL & DeepSpeed: The Ultimate Boost!

Unleashing the Power of BLOOM 176B with AWS ml.p4de.24xlarge, DJL & DeepSpeed: The Ultimate Boost!

After ChatGPT: NEW BioGPT by Microsoft | Do YOU trust Microsoft for your Medication?

After ChatGPT: NEW BioGPT by Microsoft | Do YOU trust Microsoft for your Medication?

Improve ChatGPT: Modular, Adaptive, Smart LLM | Inside ChatGPT

Improve ChatGPT: Modular, Adaptive, Smart LLM | Inside ChatGPT

Fine-tune ChatGPT w/ in-context learning ICL - Chain of Thought, AMA, reasoning & acting: ReAct

Fine-tune ChatGPT w/ in-context learning ICL - Chain of Thought, AMA, reasoning & acting: ReAct

The Intersection of Copyright Law and Human Faces: Exploring Virtual K-Pop with MAVE

The Intersection of Copyright Law and Human Faces: Exploring Virtual K-Pop with MAVE

New TECH: Vision Transformer 2023 on Image Classification | AI

New TECH: Vision Transformer 2023 on Image Classification | AI

PyTorch code Vision Transformer: Apply ViT models pre-trained and fine-tuned | AI Tech

PyTorch code Vision Transformer: Apply ViT models pre-trained and fine-tuned | AI Tech

New BING ChatGPT: Unlock the Power of Emotions in your Search Engine!

New BING ChatGPT: Unlock the Power of Emotions in your Search Engine!

New BING ChatGPT loses its mind

New BING ChatGPT loses its mind

Self-Attention Heads of last Layer of Vision Transformer (ViT) visualized (pre-trained with DINO)

Self-Attention Heads of last Layer of Vision Transformer (ViT) visualized (pre-trained with DINO)

Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI

Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI

Microsoft strongly restricts access to ChatGPT on new BING - WHY?

Microsoft strongly restricts access to ChatGPT on new BING - WHY?

PyTorch ViT: The Ultimate Guide to Fine-Tuning for Object Identification (COLAB)

PyTorch ViT: The Ultimate Guide to Fine-Tuning for Object Identification (COLAB)

New BING Chat AGGRESSIVE

New BING Chat AGGRESSIVE

Panoptic Image Segmentation: Mask2Former explained | Identify all objects!

Panoptic Image Segmentation: Mask2Former explained | Identify all objects!

Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial

Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial

Dream Job Alert: AI Prompt Engineer - $335K | AI Prompt Design: A Crash Course

Dream Job Alert: AI Prompt Engineer - $335K | AI Prompt Design: A Crash Course

Streamlining Similar Image Detection with ViT in PyTorch: A Step-by-Step Guide

Streamlining Similar Image Detection with ViT in PyTorch: A Step-by-Step Guide

Microsoft's CEO in Trouble #shorts

Microsoft's CEO in Trouble #shorts

Why wait for KOSMOS-1? Code a VISION - LLM w/ ViT, Flan-T5 LLM and BLIP-2: Multimodal LLMs (MLLM)

Why wait for KOSMOS-1? Code a VISION - LLM w/ ViT, Flan-T5 LLM and BLIP-2: Multimodal LLMs (MLLM)

OpenAI's ChatGPT can NOW summarize external Sources on the Internet?

OpenAI's ChatGPT can NOW summarize external Sources on the Internet?

ChatGPT polarizes

ChatGPT polarizes

Hospital /Clinic AI Decision Models: Performance of 12 AI LLM Systems (incl $$) Radiology, Biomed

Hospital /Clinic AI Decision Models: Performance of 12 AI LLM Systems (incl $$) Radiology, Biomed

ChatGPT Prompt Engineering w/ in-context learning (ICL) - 7 Examples | Tutorial

ChatGPT Prompt Engineering w/ in-context learning (ICL) - 7 Examples | Tutorial

Chat with your Image! BLIP-2 connects Q-Former w/ VISION-LANGUAGE models (ViT & T5 LLM)

Chat with your Image! BLIP-2 connects Q-Former w/ VISION-LANGUAGE models (ViT & T5 LLM)

ChatGPT: Multidimensional Prompts

ChatGPT: Multidimensional Prompts

ChatGPT: In-context Retrieval-Augmented Learning (IC-RALM) | In-context Learning (ICL) Examples

ChatGPT: In-context Retrieval-Augmented Learning (IC-RALM) | In-context Learning (ICL) Examples

Code your BLIP-2 APP: VISION Transformer (ViT) + Chat LLM (Flan-T5) = MLLM

Code your BLIP-2 APP: VISION Transformer (ViT) + Chat LLM (Flan-T5) = MLLM

Buy Microsoft "Azure OpenAI Service" or buy from OpenAI its API for ChatGPT access & tuning?

Buy Microsoft "Azure OpenAI Service" or buy from OpenAI its API for ChatGPT access & tuning?

Pretraining vs Fine-tuning vs In-context Learning of LLM (GPT-x) EXPLAINED | Ultimate Guide ($)

Pretraining vs Fine-tuning vs In-context Learning of LLM (GPT-x) EXPLAINED | Ultimate Guide ($)

Reversible Transformer: ReFORMER for GPU Memory Optimization! Reversible Residual Layers?

Reversible Transformer: ReFORMER for GPU Memory Optimization! Reversible Residual Layers?

This video teaches how to integrate new, external knowledge from webpages to a LLM (LLama 2 70B) using Retrieval Augmented Generation (RAG) to improve the accuracy of answers given by the AI. It covers tools such as LLaMA 2, PEFT Docs, Gradio, Streamlit, and Hugging Face, and demonstrates how to create a domain-specific RAG-based LLM, use sentence transformer for vector space embedding, and rank top 10 answers with cosine similarity in vector space.

Key Takeaways

Build domain-specific RAG-based LLM
Create ICL in context learning optimized prompt with feedback
Use sentence transformer for vector space embedding
Rank top 10 answers with cosine similarity in vector space
Create a search index using a specific index library functionality
Use cosine similarity to find semantic similar sentences

💡 The key insight of this video is that RAG can be used to improve the accuracy of answers given by a LLM by integrating new, external knowledge from webpages.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Claude AI vs ChatGPT: Which One Is Actually Better in 2026?

Compare Claude AI and ChatGPT based on real-world usage and benchmarking to determine which one is better in 2026

Claude AI vs ChatGPT: Which One Is Actually Better in 2026?

Compare Claude AI and ChatGPT to determine which AI model is better for your needs in 2026

Medium · Programming

IntelliBooks: Classic RAG vs Graph RAG vs Agentic RAG – Choosing the Right AI Retrieval Architecture for Enterprise AI

Learn to choose the right AI retrieval architecture for enterprise AI between Classic RAG, Graph RAG, and Agentic RAG

Fluid, natural voice translation with Gemini 3.5 Live Translate

Learn about Gemini 3.5 Live Translate, a new voice translation technology that enables fluid and natural conversations across languages

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)