Development with Large Language Models Tutorial – OpenAI, Langchain, Agents, Chroma

freeCodeCamp.org · Beginner ·🧠 Large Language Models ·2y ago

Skills: LLM Foundations90%LLM Engineering80%Prompt Craft70%

Key Takeaways

This video tutorial covers development with Large Language Models (LLMs) using OpenAI, Langchain, Agents, and Chroma, with hands-on projects to create dynamic interfaces and interact with text data.

Full Transcript

welcome to this course about development with large language models or llms throughout this course you will complete Hands-On projects that will help you learn how to harness the immense potential of LMS for your own projects you'll build projects with LNS that will enable you to create Dynamic interfaces interact with vast amounts of Text data and even Empower LMS with the capability to browse the internet for research papers you'll learn about the intricate workings of LMS from their historical Origins to the algorithms that power models like GPT option is a passionate educator in the field and he teaches this course hi welcome to this course on llm engineering and development brought to you by loop.ai my name is akshat and I'm excited to be teaching you guys this course so who should watch this anyone who's interested to learn Hands-On llam usage in theory through explanations and multiple guide projects paired with a fairly basic Python Programming knowledge should be pretty comfortable following along so here are all the projects that we're going to be working towards we're going to be able to create a clone of the chat gbd user interface along with the large language model that will help us interact with it with custom personas we're going to be able to have conversation so with our documents like text files and PDF files uh that chair gbd may have not been trained on we're going to be able to use Asians which are self-prompting large language models and enable these large language models to browse the web and research and literary research papers using the archive API we're going to be able to enable these large language models to use more than five tools in the real world along with equipping them with their own custom tools so here's all the course content and let's get started with the basic introduction to other guns so an llm basically happens when you combine a massive neural network with huge amounts of data and train it on this huge amounts of data once it's been trained you then align it to human values in an attempt to create a reasoning engine so examples of these LMS are bird llama more famously gbt 3.5 and gpt4 so the concept of alums has been around since 1996 but why have they only recently gained traction the reason why is because this is the first time in history where llms have been actually able to outperform human reasoning in certain contexts and the reason for this is due to a huge import improvements in performance and scale now more on scale uh as you can see modern llms like gbd 3.5 have a huge amount of money and skill that goes into developing them GP 3.5 has over 175 billion parameters and you can think of parameters as neurons in your brain along with this it has huge amounts of training data GPT 3.5 is not where this cycle of LM development stops because open air has another model called gpt4 that is essentially an upgraded version of GPT 2.5 that performs all the data GPT 3.5 can but just seems to meet the goal a lot better the reason for this is due to its huge amount of parameters 1.76 trillion and with this it has the capacity to undergo training with even greater amounts of training data a huge model like this that's trained on a huge amount of training data is pretty dangerous to humanity so the demand for aligning this model to human values and feedback is also very important and plays a very important role apart from GB 3.5 and gpd4 by openai there are several competitors in this landscape like Microsoft Facebook Google who are all actively publishing resubscribers and breakthroughs pretty much every week at the time when this is recorded so now let's look through some of the algorithmic breakthroughs that got us to this point I'm going to go through these by explaining to you the typical architecture that you would follow uh if you want to train your own custom large language model so let's start with choosing the architecture and tokens so a large language model you can think of is basically a mathematical function that has just learned to predict the right words given the given some input context and if you want to use this as a mathematical function you'd obviously have to deal in numbers so we use something called tokenization that converts a string of text into a vector of numbers and so the tokenizer splits these words into individual elements and then assigns it a unique number so once you assign the unique number uh you also need to tell the llm when to actually stop generating or otherwise it'll be stuck in an input too and that's why you have this stop token here so now let's look at the brain which is actually what learns these relationships between words and predicts the right word so you don't need to really know all the complex math that goes in that gets involved in making this neural network for this course but I'm just going to give you a quick intuition as to how this works so let's say one of this input sequence like this word here gets inputted you can think of all these layers as random numbers initially and what you can think of them doing is just multiplying them multiplying this input sequence uh yeah these many times and then you would get an output so obviously if all these layers are set to random numbers you're gonna have a pretty bad and random guess so what we do after we call that the output is we look at what word we actually want the model to output and then we compute a difference between our output value that we predicted and the the actual value that we need once we have that the algorithm takes node and it adjusts all of these parameters here uh to step in that direction so once that's done we've pretty much finished one training step and and adding to this overall scale concept this whole training process occurs hundreds of billions of times so you can imagine that even after the first 10 million iterations it's going to get pretty good at guessing what the next word is supposed to be so now more on the training problem I was just talking about the way these large language models are trained is through a next word token prediction task which basically gives the model the question as well as a blank that it's supposed to fill in so here obviously the answer is books so it's given all this context here surrounding context and it's asked to predict the word so these these are just four questions here but the model gets inputted billions of questions from data uh from code to college textbooks to articles to lyrics to podcasts and pretty much any data that you can scrape off the internet so good so now we've effectively trained this model that can predict the next token in our sequence but this is still very limiting because it's just one token we want we want our slm to ideally express ideas and thoughts and actually reason so the way we do that is by actually predicting this next token here and then inputting the predicted token back into the model so that it can predict again and you just keep you know collecting these predictions and you'll get a string of text and um going back to the stop token this is where it's important because once the model is finished with the thought it needs to be informed that um yeah you should just stop the thought so great so now we have this huge llm that's been trained on a bunch of different data and we can use it as a reasoning engine but maybe the llm doesn't have the knowledge that it needs in order to work in your specific use case and this is where fine tuning comes in fine tuning um find you can use fine tuning to further train a model to your own personal context and this is good because you don't need that much data anymore you just need a small label Corpus of your example data and examples of something that you can use fine tuning for is generating custom mid Journey or image generation prompts for another model as well as letting it learn information Beyond its knowledge cut off so you know say um you wanted to teach it about the latest cancer research so a quick note here uh the second example where you're actually teaching it information it's not the most efficient uh when you find you we're going to be using a much better method called Vector databases in this course uh more on that later but generally you'd fine tune if you want to change its Behavior like this example here so now that we have this model with its optional fine tuning you can further take this to production and make it safe so that it doesn't um spear harmful content through RL HF which is reinforcement learning from Human feedback the reason we do this is because the the training data that we scrape comes from a bunch of different sources like articles podcasts textbooks lyrics Etc and this data is bound to have a lot of bias in it and alignment with human values basically fine tunes it to remove the bias so here's um here's what they do what open AI does to fix this issue so basically it's uh it uses a human labeler to just get safe output and reinforce the model to Output these safe tokens okay so now we pretty much went through the entire pipeline which is everything from training to fine tuning to aligning to human values and now we're gonna actually look into how to get the output from our models so the the question that you asked the process of asking a large language model a question is called prompting and the model generates something called a completion which is just a string of text that's likely to complete your previous text inference parameters are another technique that you can use to enhance the creativeness of the model and we'll be trying this out in the chat gbt playground but these are the four parameters you can change so these two basically make your outputs more random by changing a lot of things like the window for the probable words to how much these low probability words are weighted add frequency parity and presence parity are the other two inference parameters and they basically make sure that your model doesn't output the same answer in the same style for every single for every single identical question that you ask it so now let's go ahead and try this out in the chat GPD player so quick ignore from this future uh if you already know the basics of chat GPT and calling the API You Should Skip the next two sections and come to the third section where we'll be learning how to clone the entire chat GPT user interface so now let's explore the playground environment so in your browser just type in chat GPT breakdown and you should be prompted with this user interface here another way you could do this is just through the open AI website itself where you just log in and done okay and just hit API here and just hit play dot Okay so the reason why we're choosing to use the playground over the usual charge gbt user interface is because the playground will give us a more customizability as well as a better feel for the actual API that we're going to be using throughout this course so let's get started from um left to right in this column here you can add in your system message uh like and say you are a programmer something like that and basically the system message assigns the llm's personality throughout the entire conversation so here you can maybe say something like our high could end out the first 10 and date sheet so it pull it okay defense and it should give you some Outlet okay so as you can see it does give us the output and one thing to notice here is that there's always an alternation between the user and the assistant and so in the API as well when we script some kind of conversation we're going to be using this so just remember that it's only valid it it's user followed by assistant so now let's move on to this main column here that is going to let us customize a bunch of different parameters so uh it's generally recommended to keep the mode in chat but you can explore the models here and we have options between the versions of the GPD 3.5 line and the gpt4 line and basically the difference between GT4 and gdt 3.5 is uh gpt4 is slower but it's a lot better at logical reasoning and creativity related tasks because it's just trained on so much data and gbt 3.5 is obviously the opposite is faster but it's it's a little worse at uh you know these tasks and so there's a speedo there's a trade-off between intelligence and speech here but in general I would recommend using GPT code so we can just give it another message maybe I could write a poem and yeah so as you can see uh the previous suffers so much faster than how it's outputting now and this process of like stringing together these texts live is called streaming where you're not uh you're not just presenting uh the whole text after it's all done processing you're just doing it like it is putting it together so now once we have this we can actually change the other parameters here these in press parameters so temperature is a measure of creativity as I've mentioned before so when you put temperature at zero it's very deterministic so it's useful when you're trying to just try to understand some kind of data and when temperature equals one it's a lot more creative so you can you know use it for a bunch of different uh poem writing or just just like arguments in general maximum length is pretty obvious and so just we can uh just test out the deterministic part of this by comparing to outputs so we can write direct code for a lot thing sorting and dirt from scratch so I'll just copy this and uh submit this and while that's executing uh I'll just go over like topi is another example well that allows us to control the modern's creativity and it's only advice to you is one of these parameters uh at a time uh either temperature or top B so moving on we have frequency penalty and presence polity and if you want the model to Output a different answer for every same question or just cut down the number of repeated words in an answer need to use this frequency parity so anyway let's just so we did this when temperature equals zero so now let's compare this output to run temperature equals uh something higher like well so while that's happening uh as I said before frequency penalty uh basically penalizes based on how often a word occurs by repeated verticals and presence penalty penalizes the model based on uh just whether or not the word exists so let's just compare these two outputs with different temperatures so here the algorithm is using bubble sort but here the algorithm is using the home selection sort so as you can see the although the answers will be correct the way it approaches these answers is going to be widely different based on the temperature or the top B so another quick thing that we can do here that makes our life a lot easier is when we want to use this in the code which we'll get to in the next session you can just hit this view code button it'll give us everything that we actually need to get started so all we have to do is just copy this code and I'll see you in the next video where we're actually gonna do work through this API in our YouTube channel you should find this notebook Linked In the description below and this is something this is called a collab notebook which is just an online environment that helps us run our python code so to get started all you have to do is just press shift and enter and it you can just go through all this collab notebook so we'll just wait for all the necessary python packages to install and these packages basically let us call the open AI API and actually access charge apt so let's just wait for that to finish and let's go through what this open ebi game means middle black or not okay so now let's move on to this API key and so what is an API key since open AI builds you on using the API you'll need a password so that no one can get access to this but you and that's what the API key is it's just a password to your API so obviously I'm going to be revealing this password to you guys but I'm going to deactivate it as soon as this tutorial ends so I'm going to show you guys how to do all of that so you don't have to worry even if someone gets access to this key you can just disable it so what you would do is you would go out to your uh you know this account and you would hit view API keys and once you're in this dashboard you can just create a new secret key you can call it anything so for this one I'm probably just gonna follow course Radio 2 for one and create the secret I'm going to copy this now and once it's copied I can just yeah I can just paste it in here so this will be a password that's unique to you and say you've accidentally revealed this what you can do is you can actually just hit the trash icon here called edible key and it basically disables the gear so you can't access it through this key anymore I'm not gonna disable this but I've made a couple of ones here that I'm going to disable and yeah I'm going to be disabling this API key as soon as this video ends so let's just shift and enter and that runs this code so your environment variable is now opening Aid and this is just an example of how we call the epr I'm not gonna use this as the example I'm gonna go back here and go to the playground and use this uh example that open air has provided so in this case let's just hit new chord and then quality yeah let's put there okay so we can remove this open ABI Cube if there's already defined that up here and now this response body in this content here which is this part here we can just say what is your name and I'll show you guys the other games after we're done with this so as you can see it does run but it doesn't just return us with the response so what I can do is I can just uh the whole value is um in shorting this response variable so I can just print out response and as you can see it gives us a bunch of the state up here and the only thing we're actually interested in is this part here right it says digital assistant so notice how I decided it all equals you through here but here it's all equal to assistant the content is I am in the AI language model developed by open AI I don't have a personal name whatever so obviously this this is the only thing that we want the model the output and everything here is just uh something that we shouldn't be you know using so what we can do is we can just extract just to reply to this and so this goes through the entire Json data and it just gets us the content here so it's the same thing here let's run up here so another thing you can do is again just um everything that you do in this playground you can do here so you can put on system message here and the way you do that is just by defining this other dictionary and you put the row a system and the content does whatever the system as it is so here you can see your helpful assistant um you know obsess like potatoes and shift enter again it should take a little while but we should be clicking on the output pretty soon yeah so here as you can see it does assign that personality to it because you know it's just potatoes but it also just completes the task as well another thing that you can notice in this example is I have the power to change everything that I've changed in this pretty loud so I have the bar to change the model the messages the temperature and if you come up here you can actually see that everything that I changed up there you can sort of just assign a general variable and change it I'm not going to be doing that because I don't think it's very necessary for this tutorial but we can look to some more prompting now so one thing to note is that GPD 3.5 doesn't really pay attention to the system message as much so generally whenever you want to assign a system message you should be deported so let's go to this example here which is few short prompted so if you guys remember what I said about fine tuning in the previous in the slides explanation this is sort of analogous to that when you're sort of giving the model example answers you're scripting this entire conversation here so you're saying that the system method is this the user has inputted this and the assistant has responded like this so this didn't actually happen but you're saying that this is like the ideal uh response that the assistant should be giving you and once you give it lineup examples it sort of learns uh to you know output the answer based on how the user desires it so here let's just change the model to gpd4 and we can run this code there yeah so as you can see it does give us this output we can try this well we can track this is sort of a simple example but we can try this maybe Maybe if so I'll just uh what liquid so as you can see like we can compare the output that it's given previously with the few short prompting and now and I would say like the output before was a lot more concise and in-farm than whatever this is so again another example of things short prompting where we can actually just assign the system message as something that follows pattern so that it it can transistor Mom it enhances whatever it learns so I highly recommend you guys to just go into this notebook it is just your API Key by connecting with your car and just simulating a bunch of conversations and tweaking a lot of these parameters here and so I'll get a feel for it because there are no ideal set of parameters for every um for every problem much like there is no ideal dump for every problem it's a process that comes from just uh you know keep or driving error and just trying out a bunch of different solutions so one thing I didn't mention so far is that like this API is actually built so and the way open AI charges you for this for using this API is through the number of tokens so you can think of a token as I've explained before as three-fourths of a word approximately and so for every three-fourths of a word every thousandth reports of a word you would be charged differently based on what model you're using so this took this Library here um sort of helps us understand how much we're using it but we are a much better way to do this is just going through uh is going through your manage account and just looking through uh how much you're being built because openm provides you with all the data and accessory we're here is just a way to do this programmatically um you can just this is just something where you just copy paste the code in and it helps you with that there's really no bonded understanding what this means okay so as you can see it's counted 129 prompt tokens so you can just do the lab and see how much that charges so in general the high-end models like gpd4 are slightly more expensive than gbt 3.5 but I would say the price is um pretty minor it's like 0.03 dollars or something or a thousand tokens per one of the models so that was it for using the API uh hopefully I've given you guys a much better intuition on how this works so uh one last thing that we can do is actually the finding our prompt through error Labs so as you can see all of this was just asking it a question and what we continue to sort of help with asking better questions is asking the model itself to make a better question so what we can do is like uh can you trace this question to be more so here consists and to the point and then you would just ask it like what is python or that's pretty concise already maybe we could do like um if I had three apples and my brothers father Eric or how many do in total and so we can just you know run that thank you yeah so it makes it a lot about concise and you can use this as your prompt instead of your other prompt and the reason why this is helpful because is because you can cut down the token size and so you know open AI charges you less we're using more concise plugs so you can just say reduce this and hit submit yeah so this is just a match token so uh over time if you keep calling some prompt you can just optimize it through charge GPD so so that was it for prompting now let's go into actually applying this into some projects my first project will be cloning the entire chart GPT UI and assigning it a custom personality so that we can interact with uh you know custom characters let's unveil it users we're gonna be using the chains that package in order to make our chat GPT column so chain lit is basically this framework that allows us to build user interfaces really easy if you have had experience with a package like streamlit you can imagine chain that is like streamlit but for a large language model applications so comes with a bunch of different features and unique Integrations and ideally uh this is what our end goal is going to be as shown by this video so just skip through yeah so as you can see it's a pretty pretty feature-itch user interface and now let's get started building this okay so now let's actually get started with uh cloning this user interface the first thing that we're going to want to do is import our change that package and the way we install it before we import is we go to the terminal and ignore everything that's happening here all we need is uh just a python environment and you will do this command so bip install chain lid and we're doing this in the terminal tab so once you do that it should install all your packages and if you haven't already you'd want to do pip install openai as well and so this basically installs all the packages from external source that you can use to you know work with chain bit and all these features okay so now let's start our first goal will be the created user interface that just outputs everything that the user inputs let's start by doing defining a tag so so on message you would Define a function uh main this would be as asynchronous don't worry about async because that just means that it's going to wait for the user to send the message so it start executed immediately so basically this function main takes in uh the parameter of message which is supposed to be this is going to be a mapped to a string so this message is going to be whatever the user inputs and all we're going to do is we're going to say await chain that scl CL dot uh message and then we're going to send this and right now what it's going to do is it's just gonna return an empty object so what we have to do is we have to actually uh specify what we want in the contract so in the content here we can uh yeah so in the content we can just say message addressing message because that's what the user inputs and I think yeah so this is about it for our basic example what we can now do to run this and actually see our user interface is in your terminal again uh you have to learn to get really familiar with your terminal because it can be really useful for the upcoming projects so go to terminal and just run chain later on and look at your bicon file's name minus main.py here so I'm just going to put main.py and then I'm just gonna use the W flag so you should really remember that um you know you should remember that you're not supposed to run a chain with program with your run but run button so I'll make this more clear uh after we run this command okay so as you can see it's running on localhost Port it calls it is just just just so that we have complete Clarity we will be using the screen button to drop something for code but this executes it in a server and chain that helps us you know deal with the server and all of the back end so all we have to do is just run that commit and now if we just put it in high uh you know whatever uh whatever the user inputs the chatbot will out so for some of you running this for the first time it might not actually look like this you might have a bunch of random text that's appearing there like welcome to chain lit and the reason why that's happening is because of the chain the dot MD file so here's just a sneak peek up where we're going to be covering in the future but for now go to this channel.md file as you can see mine is empty but yours might be just full of a bunch of text and all you have to do is that none of the text here is imported so you can just you know Ctrl a and delete all of it or you can add something like welcome to this interface Begin by sending uh wait that's it or something like that so once you do this hit command s which saves it as you can see because of our W flag that we put here uh the server actually watches for changes so one as soon as we hit the Ctrl save it says file modified change the dot MD preloading app and it should be reloaded yeah okay so it as you can see it says welcome to this interface Begin by sending a map subtract there's a CFI and it goes away but as you can see it just outputs everything that we have a couple of more options that we have in this user interface is hiding you know chain of part and expanding these messages but more broadly you know you can double between dark more in light mode I'm gonna stick to dark mode for this tutorial okay so now we pretty much have you know the basis and the user interface all we have to do now is pass the message into into the API and we should and then you know just dot send the answer right so let's let's do that now and this is going to be pretty easy because we've already done this uh bunch of times in the collab notebook what we're gonna do is just you know wait for okay so here all we're gonna do is maybe just make something called response here and in this response object we're just going to put uh the chat completion and dot create and in this we're gonna put our model our uh we're just gonna leave that empty for now and our messages again MD for now and maybe your temperature so why not come for a shirt equals something so you you need to remember to put commas after all of them except for the last one and commas between the messages as well so the model can be anything you want another gbd4 and the messages is an array of dictionaries so in the dictionaries you'll need to pass two parameters you'll need a role and you'll need the contact and I'll explain what that is in just a second so again roll or whatever that is and then the corn bat a temperature you can decide to build you know whatever you want between zero and one or two and once we're done with this we gain a tourism singer yeah it should be fun Okay so so since so in our role uh key value pair we can just put this as the assist step and here we're just gonna put it as the user so as you can see here basically what happens is that this is the assistant message as you've seen in the chat GPT playground and this is saying that uh you you're like asking the aggression basically so in the content for the system message you can put you are uh helpful assistant and in the in the user message we're actually going to pass in our message variable that's passed on by the user and once we have this we instead of you know content equals whatever the user messages we're just gonna return the response from childcpd Okay so now let's run this again same command in terminal okay so it's saying uh no idea keep provided let me provide will make the idea and so maybe uh we can just we'll store it started and then we can just put this in so and skip it save that hey then just say just something I'm going to draw it here and I'm not quite sure we're going to drop so let's try to debug this maybe uh okay so maybe what we can try doing is just try this response into a string and hit save let's let's see if this works okay perfect so yeah um basically the mistake I made was that this was just returning just an object a Json object and the content on the uh support strings so as you can see in this object again uh let me just you know redo that okay so here to my message hi it says content is hello how can I assist you today and obviously we don't want any of this other stuff around it and as I've shown you guys in the collab tutorial all we're gonna do is just uh remove these brackets and then we're gonna index our way through to get our message so we're gonna go to choices and then inside choices we're gonna pick the zero the element and then the zero element we're gonna go to the message and then send message where it went to go to content okay so all of this is going to be wrap in an app string so I'm just gonna put this like such and then you just remove this and then you put this here Okay cool so once we did once we do this let's just put a cover here where say I'm not 60 and then Mom they're saved about so now let's try running it again oh yeah okay so it works you can uh uh you know tell me a short one like story yep so as you can see we've successfully cloned the chat GPD user interface and we can also uh call our API through this and modify any of our parameters here to suit the content now what we're going to do is something interesting where we can sort of talk to the model um that is assigned a specific personality so what what I mean by that exactly is we're going to be able to change the user message to something that changes its personality so here you can say like you are an assistant that is obsessed with power rabies Legolas all right so once you do that you can just hit Ctrl save it should modify a bunch of times and you know you can just start by sending a message so Phi and once we wait for a while it should return us the output so yeah it it follows the system message and it really pays attention to it because as you can see here you know it's obsessed with legless so this is how you can assign uh personality to it and this is how uh if you've ever gone to GitHub training repositories there's a bunch of these uh you know different different uh repositories that all they do is just you know change the system message to make it behave a certain way and that's exactly how you're able to get this Behavior without fine dma so next let's address some limitations of this approach so we have successfully built uh working chat gbd clone but there are a couple of limitations to this regarding the user experience so there's no streaming involved and streaming is basically this live uh token prediction of outputs that happens instead of just processing the output all at once and passing it directly there's no generating the messages um Mass uh you know button or message that basically helps the user actually identify it the backend process is running because otherwise you'll just have to wait without any confirmation on whether or not anything is actually happening on the back end and additionally after that there is no back-end context and by that I mean the user doesn't know what kind of llm or what is actually running in the background this third feature is pretty optional but it's uh very useful when it comes to debugging when we go into more complex large language model systems so let's see how we ideally wanted to look like this is from the official chain that website as you can see there is streaming here you know these tokens are being spit out live there is this background step that I talked about and it reaches to rewind the video again uh you can see that that stop task button there allows us to actually see whether or not this llm is actually generating and we are able to stop the flow so how do we get here the answer to that is through Nang chain so line chain is the most popular python library that helps us deal with error ramps it has some pretty Advanced functionality uh you know excluding whatever we just talked about here and we're going to be using this extensively in later parts of our course when we're going to web browsing agents and using our tools with agents so anyway let's get into our land check implementation now great so now let's look into how to actually integrate Lang chain so that we get access to all of those user-friendly features this is what we're going to be working with now uh I'm just gonna you know stop that app that we just ran for running and we're gonna pip install and this is very important U Lang chin and what this EU flag does is it basically installs the very latest version of blank chip and the reason why that's important is because line chain is a library that gets updated pretty much you know every week or so so anything that works today might be depreciated or discontinued next week and in order to you know keep all your dependencies in check we're going to be having to you know update our land chain so we I just defined a random string here so this template just gonna allow the llm to think step by step as it does here and the first feature I'm going to show you from Land chain is prompt template so if you guys have ever worked with the python format function this is pretty much the same thing what I mean by this is uh this is just vanilla python so I'm just gonna do uh template dot format and then inside this question equals uh you know whatever you want like what is one two three four whatever so let's just I'll happen on this okay my bad um you're supposed to print this up okay so let's just print this up yep so what it does is it takes this template variable and then it formats it to one two three four but how's that actually happening well these curly braces indicate that this is an object that needs to be formatted so everything inside this curly braces is going to be replaced by uh what is one two three four and that's exactly what prompt template does but it makes it a lot easier and a lot more llm friendly so we're gonna get right ahead with the line chain and change that implementation we're going to just Define two tags here we're gonna do CL dot on message as well and once we do this we are going to yeah so we are just going to create a main chat so we're gonna just Define our main function here and inside this we're gonna do our prompt equals something our llm chain equals something and then CL dot user session dot set all right llm shin tool as an M chain so and chain you can set that to llm shin okay so now that we have that let's actually uh go into what we're doing here so what on chat star does is as soon as this object of you know the chain lit UI is deployed here are the variables that we need to initiate so I'll just um yeah I'll just create the prompt here so prompt template is the object and set this we're gonna put our template as template and it takes some random term I take some template and with this it takes an input variable so input variables because you have something else so in this case our template would equals the template variable and our input variable is the stuff inside the girly bracket so that's this question okay so that's it for our prompt and now let's initialize what our llm shape so what is an llm chain here and llm chain for now you can think of it as something that connects the prompt with our large language model so in this case let's just uh foreign a bunch of uh parameters so we're going to do prompt equals something or else the lamb equals something and our streaming equals something and our employables equals something so I'm I'm gonna go through all what all of these mean and the reason why it's showing that is because it needs to be commas so our prompt is basically just our prompt uh which is the variable I Define here uh and our slm which is going to be the regular open AI model but there's a different way to define it now instead of you know going through the hassle of doing a bunch of different uh you know API calls all you need to do is just open Ai and when we pass in our temperature temperature here and my bad streaming is supposed to be uh parameter inside the alarm itself so uh streaming you can just set to true and this would you know stream yard and temperature will set the one for that after this verbose is basically uh our hot process this will make more sense when we cover the agents tutorial but you can think of verbose for now as just this extra additional text that goes into and you know helps the llm with resume so the thought process that leads up to the answer and then we're going to take this llm chain and then we're gonna store it in a user session variable called llm chain so that we can access it on the on message call so in our on message call where you know this we've done this one already let's define a mean another main function and there are message B string whatever and inside this we're gonna retrieve the chain from our user session so llm chain equals CL dot user session dot get so this time we we did get instead of set and in here we're just going to build llm Shake which is the variable that we passed in here okay so that basically just gets us this variable across tags and after we've done this it's just pretty simple now uh all we do is just uh call it a desert result variable so instead of calling our model itself we'll now be calling our llm chain so what we do is await llm underscore chain dot asynchronous call and then inside this we're gonna do our message and then call back so don't worry about what this callbacks thing is it just helps with uh streaming uh it helps its streaming because you know it just calls back and establishes a socket an action from what I understand with it and then afterward we do this all we do is just return or output as we've always done so CL Dot message and then we're just gonna sell you that and then here we're going to do result address um text so as you can see the although this may not save a lot of you know lines of code and might be slightly more we have a much more organized framework to think about things now because once we adopt this land chain framework uh you can you know sort of do stuff that's a lot more complicated than what we're doing within similar lines of code so an example of this is what we're going to be covering next uh you obviously you're not expected to understand if this right now but all of this is very close to uh how many ever we're doing here and this is infinity more complex than what we're attempting to do here so once we're done with that let's um go back and fetch our API key okay so once we're done with just putting our API Keys uh we can actually test this out and look at all of our new features so let's change the drawing and then change that run and I file name here's 19. integration right yeah Dot py and then we're just gonna put a watchdog there then so just wait for that to run okay great so now if we put high as you can see we get everything that was in the media that I showed you so hi what is your question about you know a thought process here is there's a question about this thought process stuff is going to be more relevant once I look into other Concepts but this is basically the same thing you can just say uh what is your name how are you doing so and it's actually streaming here and then passes us the final output bad yeah so everything that you did previously with your API you can sort of uh test out working with our system message here uh you can desktop working with different user messages different parameters like dog B Etc and now let's get into something that's slightly more interesting we're gonna be able to use this line chain framework to ask chat GPD questions on our own PDF and text documents that could be you know any size like 2000 pages so more on that now so now let's talk a little bit about battery databases and embeddings here are some examples that you might have heard of we're going to be using chroma DB and some of the others for this course so what are vector databases what two databases are basically this database or storage for specifically embedding information Vector databases allow us to query and utilize the this embedding information as fast and efficiently as possible so what's an embedding well and embedding is just this multi-dimensional space where all the similar objects are grouped together so in this case um in the Simmons all the symbols that represent two are sort of grouped together all the symbols that represent four are grouped together and all the symbols that represent nine are grouped together so why is this important well when you have the ability to group some similar objects based on a bunch of different parameters together you can build recommendation systems you can build search engines if this is also used in L Adams themselves in generative Ai and for this specific case we're going to be using it for context window expansion so if you remember from one of the previous slides in the introduction I said that fine tuning is not a very recommended way of enhancing a model's knowledge well this is where Vector databases come in because while fine tuning has its disadvantages because of catastrophic for gaming Vector databases just simply retrieve the relevant context information for the language model so it can use it what do I mean by this well let's get into this let's get a little more technical into this so say I have this 2000 page PDF which I want to use and I have a couple of questions about it and I want to be able to ask sharp CPT or any other Ram uh this question in general where if you'd want to do this I guess the naive way would be to just copy paste every single letter in that um in that book and just paste it into the chat GPT terminal window but obviously this won't work because you'd be hit with a context limit because the there's a max number of tokens that can be inputted into chat GPT and the way to get around this context help it work with new information is where Vector database is coming so basically I'm just going to put this PDF into a text splitter which will split it into text chunks of equal length so you can imagine like just five words sentences or a thousand word sentences and once these chunks are made they put into something called an embeddings generator and these embeddings are then stored in our Vector database so now what's cool about this is we can actually ask the question and the vector database performs an action called cosine similarity and what cosine similarity is is it finds the nearest 10 or however how many are you want uh 10 relevant sentences Within These embeddings so the 10 closes sentences and then it just outputs them so that's what the embeddings and the vector database do and this Vector database returns those 10 relevant sentences and these 10 relevant sentences go into a question answering model along with our question so that it can be answered and the reason why this is important is because we went from a 2000 page book to 10 relevant sentences based on any query and to me 10 sentences is obviously a lot more manageable and useful in a model than you know a 2000 page book so why use Vector databases I have said before like cosine similarity is the only function that Vector B databases actually perform along with storage so why can't you just put all these embeddings into something like an array and do a cosine similarity function with them it's certainly possible but the reason why Vector databases are just so popular and being used so much is because they have clever algorithms that help us retrieve all these relevant text information at super fast speeds along with a lot more efficient memory efficient usage additionally it's also very convenient because we just we can just retrieve those 10 relevant sentences through one simple function call so this is going to be the architecture for project 2 as I've already mentioned we're going to be able to we're going to build this entire Pipeline and one very important distinction I want to make here is the difference between our embedding generator and Q a model if you remember from the number classification example I told you that the task was digit classification and it's able to group those uh you know symbol similar symbols together this is very similar but here you're doing that next um you know that next token word prediction problem and this is there's an important difference here because the embeddings generator neural network and the Q a model are very different the embeddings generated neural network for this case we're going to be using something called Adda 2 because it's a lot more efficient and it's sort of um you know the standard for generating embeddings whereas the Q a model is going to be something like gpt3 and gpd4 um in in theory it is definitely possible to generate embeddings through gpt3 and gpt4 but add a uh tends to perform better and is a lot cheaper so now let's get into the code so now let's get some more hands-on experience with how Vector databases work basically what we're going to be using for this tutorial is chroma DB which is an open source Vector database that we can run locally on our machine as pretty scalable for production so the way we get started is we just do pip install chroma DB and it should be installing everything but if you ever face any problems with installation and I say this because I have personally faced a problem of editing out here when I try to install this package for the first time the way you fix that is running this command that I've commented out here so if you do that the you know the errors will restart the clocks out once we do that we can set our chroma client equal to you know just our Chrome idb so that we can start querying our our question so here we can do collection and then inside this you do chroma client dot create a collection and you can think of a collection as basically this place where we store our embedding so this is the actual uh Vector database so so my correction will be our Vector database this is supposed to be chroma okay and now since we say that it's supposed to be our Vector database we can add information to this Vector database in the form of three variables so there is a documents variable there is a metadata variable and then there's an ID variable so our collection object contains all these three and let's go through them one by one and understand them step by

Original Description

Welcome to this course about development with Large Language Models, or LLMs. Throughout this course, you will complete hands-on projects will help you learn how to harness LLMs for your own projects. You will build projects with LLMs that will enable you to create dynamic interfaces, interact with vast amounts of text data, and even empower LLMs with the capability to browse the internet for research papers. This course was developed by @Trainer.ai_app Colab notebook for introduction to the API:https://colab.research.google.com/drive/1gi2yDvvhUwLT7c8ZEXz6Yja3cG5P2owP?usp=sharing Github : https://github.com/pythonontheplane123/LLM_course_part_1 Join the Discord: https://discord.com/invite/Hq39QgRU Twitter : https://twitter.com/AkshatNon ❤️ Try interactive AI courses we love, right in your browser: https://scrimba.com/freeCodeCamp-AI (Made possible by a grant from our friends at Scrimba) ⭐️ Contents ⭐️ ⌨️ (0:00:00) Brief introduction to LLMs ⌨️ (0:11:49) Quick note from the future ⌨️ (0:12:04) Chatgpt playground (skip this is you know this already) ⌨️ (0:18:21) GPT API basics (skip this is you know this already) ⌨️ (0:30:43) Brief intro to chainlit ⌨️ (0:31:33) Cloning chatgpt user interface ⌨️ (0:45:37) Limitations of our interface ⌨️ (0:47:48) Adding streaming, backend view, stop sequence button ⌨️ (0:58:42) Introduction to vector databases ⌨️ (1:04:42) Vector databases hands on ⌨️ (1:12:10) QnA with Documents - .txt and .pdf ⌨️ (1:23:32) Testing out our Q&A system ⌨️ (1:27:22) Introduction to web-browsing and agents ⌨️ (1:32:52) AI researcher ⌨️ (1:42:23) Human as a tool ⌨️ (1:44:44) Mini code interpreter plugin(Replit tool) ⌨️ (1:46:29) Searching youtube using agents ⌨️ (1:49:19) Guide to explore more ⌨️ (1:50:33) Shell Tool ⌨️ (1:55:43) Create your own tools ⌨️ (2:01:19) Ending Notes 🎉 Thanks to our Champion and Sponsor supporters: 👾 davthecoder 👾 jedi-or-sith 👾 南宮千影 👾 Agustín Kussrow 👾 Nattira Maneerat 👾 Heather Wcislo 👾 Serhiy Kalinets 👾 Jus

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from freeCodeCamp.org · freeCodeCamp.org · 0 of 60

← Previous Next →

React: Production Server Setup Part 2 - Live Coding with Jesse

React: Production Server Setup Part 2 - Live Coding with Jesse

freeCodeCamp.org

cookies vs localStorage vs sessionStorage - Beau teaches JavaScript

cookies vs localStorage vs sessionStorage - Beau teaches JavaScript

freeCodeCamp.org

Browser history tutorial - Beau teaches JavaScript

Browser history tutorial - Beau teaches JavaScript

freeCodeCamp.org

Graph Data Structure Intro (inc. adjacency list, adjacency matrix, incidence matrix)

Graph Data Structure Intro (inc. adjacency list, adjacency matrix, incidence matrix)

freeCodeCamp.org

React: Parameterized Routing with Next.js - Live Coding with Jesse

React: Parameterized Routing with Next.js - Live Coding with Jesse

freeCodeCamp.org

React: Dealing with jQuery Issues - Live Coding with Jesse

React: Dealing with jQuery Issues - Live Coding with Jesse

freeCodeCamp.org

setInterval and setTimeout: timing events - Beau teaches JavaScript

setInterval and setTimeout: timing events - Beau teaches JavaScript

freeCodeCamp.org

Browser and Device Testing - Live Coding with Jesse

Browser and Device Testing - Live Coding with Jesse

freeCodeCamp.org

Last Minute Updates - Live Coding with Jesse

Last Minute Updates - Live Coding with Jesse

freeCodeCamp.org

Post Launch Updates - Live Coding with Jesse

Post Launch Updates - Live Coding with Jesse

freeCodeCamp.org

React: Setting Up Google Analytics - Live Coding with Jesse

React: Setting Up Google Analytics - Live Coding with Jesse

freeCodeCamp.org

React: Masonry Layout - Live Coding with Jesse

React: Masonry Layout - Live Coding with Jesse

freeCodeCamp.org

Load Balancing Digital Ocean Droplets - Live Coding with Jesse

Load Balancing Digital Ocean Droplets - Live Coding with Jesse

freeCodeCamp.org

try, catch, finally, throw - error handling in JavaScript

try, catch, finally, throw - error handling in JavaScript

freeCodeCamp.org

Load Balancing: SSL Passthrough Setup - Live Coding with Jesse

Load Balancing: SSL Passthrough Setup - Live Coding with Jesse

freeCodeCamp.org

Graphs: breadth-first search - Beau teaches JavaScript

Graphs: breadth-first search - Beau teaches JavaScript

freeCodeCamp.org

React: Masonry Layout Part 2 - Live Coding with Jesse

React: Masonry Layout Part 2 - Live Coding with Jesse

freeCodeCamp.org

React: WordPress API Live Search - Live Coding with Jesse

React: WordPress API Live Search - Live Coding with Jesse

freeCodeCamp.org

Creating WordPress Custom Post Types - Live Coding With Jesse

Creating WordPress Custom Post Types - Live Coding With Jesse

freeCodeCamp.org

Dates - Beau teaches JavaScript

Dates - Beau teaches JavaScript

freeCodeCamp.org

Miscellaneous Front End Updates - Live Coding with Jesse

Miscellaneous Front End Updates - Live Coding with Jesse

freeCodeCamp.org

Merging a Pull Request from GitHub - Live Coding with Jesse

Merging a Pull Request from GitHub - Live Coding with Jesse

freeCodeCamp.org

React + Prettier + Standard JS - Live Coding with Jesse

React + Prettier + Standard JS - Live Coding with Jesse

freeCodeCamp.org

React: Sortable Responsive Table - Live Coding with Jesse

React: Sortable Responsive Table - Live Coding with Jesse

freeCodeCamp.org

Geolocation Sorting by Distance - Live Coding with Jesse

Geolocation Sorting by Distance - Live Coding with Jesse

freeCodeCamp.org

Tradeoff Matrix - Agile Software Development

Tradeoff Matrix - Agile Software Development

freeCodeCamp.org

The Definition of Ready - Agile Software Development

The Definition of Ready - Agile Software Development

freeCodeCamp.org

Getting first React job without experience - Ask Preethi

Getting first React job without experience - Ask Preethi

freeCodeCamp.org

React: Google Analytics Click Tracking - Live Coding with Jesse

React: Google Analytics Click Tracking - Live Coding with Jesse

freeCodeCamp.org

Submitting a PR to an Open Source Project - Live Coding with Jesse

Submitting a PR to an Open Source Project - Live Coding with Jesse

freeCodeCamp.org

Should I go back to school to get CS degree? - Ask Preethi

Should I go back to school to get CS degree? - Ask Preethi

freeCodeCamp.org

Hero Section CSS Changes - Live Coding with Jesse

Hero Section CSS Changes - Live Coding with Jesse

freeCodeCamp.org

Working Agreement - Agile Software Development

Working Agreement - Agile Software Development

freeCodeCamp.org

A day at Pennybox with Co-Founder Reji Eapen

A day at Pennybox with Co-Founder Reji Eapen

freeCodeCamp.org

React: Sorting and Filtering Data - Live Coding with Jesse

React: Sorting and Filtering Data - Live Coding with Jesse

freeCodeCamp.org

React: Sorting and Filtering Data Part 2 - Live Coding with Jesse

React: Sorting and Filtering Data Part 2 - Live Coding with Jesse

freeCodeCamp.org

React: Building a New UI - Live Coding with Jesse

React: Building a New UI - Live Coding with Jesse

freeCodeCamp.org

Definition of Done - Agile Software Development

Definition of Done - Agile Software Development

freeCodeCamp.org

Getting started with jQuery (tutorial) - Beau teaches JavaScript

Getting started with jQuery (tutorial) - Beau teaches JavaScript

freeCodeCamp.org

Making a React Blog with WordPress Content - Live Coding with Jesse

Making a React Blog with WordPress Content - Live Coding with Jesse

freeCodeCamp.org

React, NextJS, CSS - Live Coding with Jesse

React, NextJS, CSS - Live Coding with Jesse

freeCodeCamp.org

jQuery events - Beau teaches JavaScript

jQuery events - Beau teaches JavaScript

freeCodeCamp.org

React/NextJS Routing and WordPress API Custom Types - Live Coding with Jesse

React/NextJS Routing and WordPress API Custom Types - Live Coding with Jesse

freeCodeCamp.org

React: Working with API Data - Live Coding with Jesse

React: Working with API Data - Live Coding with Jesse

freeCodeCamp.org

React: Refactoring Components - Live Streaming with Jesse

React: Refactoring Components - Live Streaming with Jesse

freeCodeCamp.org

jQuery effects - Beau teaches JavaScript

jQuery effects - Beau teaches JavaScript

freeCodeCamp.org

More React Refactoring - Live Coding with Jesse

More React Refactoring - Live Coding with Jesse

freeCodeCamp.org

animate in jQuery - Beau teaches JavaScript

animate in jQuery - Beau teaches JavaScript

freeCodeCamp.org

"Finishing" My React Site - Live Coding with Jesse

"Finishing" My React Site - Live Coding with Jesse

freeCodeCamp.org

Starting a New React Project (P2D1) - Live Coding with Jesse

Starting a New React Project (P2D1) - Live Coding with Jesse

freeCodeCamp.org

React Project 2 Day 2: Learning Material UI - Live Coding with Jesse

React Project 2 Day 2: Learning Material UI - Live Coding with Jesse

freeCodeCamp.org

The Agile Manifesto - Agile Software Development

The Agile Manifesto - Agile Software Development

freeCodeCamp.org

jQuery: get and set with http, text, val, and attr - Beau teaches JavaScript

jQuery: get and set with http, text, val, and attr - Beau teaches JavaScript

freeCodeCamp.org

React Project 2 Day 3 - Live Coding with Jesse

React Project 2 Day 3 - Live Coding with Jesse

freeCodeCamp.org

The INVEST approach to product backlog items

The INVEST approach to product backlog items

freeCodeCamp.org

React Project 2 Day 4 - Live Coding with Jesse

React Project 2 Day 4 - Live Coding with Jesse

freeCodeCamp.org

Chickens and Pigs - Agile Software Development

Chickens and Pigs - Agile Software Development

freeCodeCamp.org

React Project 2 Day 5 - Live Coding with Jesse

React Project 2 Day 5 - Live Coding with Jesse

freeCodeCamp.org

jQuery: add and remove DOM elements - Beau teaches JavaScript

jQuery: add and remove DOM elements - Beau teaches JavaScript

freeCodeCamp.org

React Project 2 Day 6 - Live Coding with Jesse

React Project 2 Day 6 - Live Coding with Jesse

freeCodeCamp.org

This tutorial teaches development with Large Language Models using OpenAI, Langchain, Agents, and Chroma, with hands-on projects to create dynamic interfaces and interact with text data. Students will learn how to harness LLMs for their own projects and create innovative applications. By the end of the tutorial, students will be able to build LLM projects, create dynamic interfaces, and interact with text data.

Key Takeaways

Clone the Chatgpt user interface
Add streaming, backend view, and stop sequence button
Introduction to vector databases
Hands-on with vector databases
QnA with Documents - .txt and .pdf
Testing out the Q&A system
Introduction to web-browsing and agents
Create your own tools

💡 Large Language Models can be used to create dynamic interfaces and interact with vast amounts of text data, enabling innovative applications and use cases.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Sub-10ms AI Workflows: Accelerating sim.ai with On-Device Semantic Search using Moss

Learn how to accelerate AI workflows with on-device semantic search using Moss, achieving sub-10ms response times and improving user experience

Medium · Machine Learning

Stop Guessing: Guaranteed Structured Output from LLMs in Node.js

Learn to guarantee structured output from LLMs in Node.js and stop parsing JSON manually

Dev.to · Hardik Mehta

Spring AI Tutorial — Your First REST Endpoint with OpenAI (2026)

Build a REST endpoint with Spring Boot 3 and OpenAI to create an LLM-powered API, leveraging the power of AI in your applications

Notes: Memory, Context, and Large Language Models (LLMs)

Learn how memory and context work in Large Language Models (LLMs) and potential improvements

Dev.to · Vladimir Panov

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)