Large Language Models Bootcamp- Information Session

Data Science Dojo · Beginner ·🧠 Large Language Models ·1y ago

Skills: LLM Foundations90%Prompt Craft80%Fine-tuning LLMs80%RAG Basics70%Multimodal LLMs60%

Key Takeaways

This video provides an overview of the Large Language Models Bootcamp, covering topics such as LLM foundations, prompt crafting, fine-tuning, and retrieval augmented generation, with a focus on practical skills and hands-on experience with tools like Open AI, LangChain, and Vector databases.

Full Transcript

I think we are live right now so let me go ahead and get started hey everyone welcome to the uh information session called large language models boot camp my name is Raja Raja ikbal I am one of the lead instructors at data science Dojo um we will be talking about the the large language models boot camp what what do we teach what's our philosophy who are the typical attendees whether you should attend you should be attending or not uh and and so on so we are going to uh answer questions we are going to go over the curriculum the learning resources that we are offering and so on so let me go ahead and get started and before I do that I'm going to just make sure that I'm sharing the correct screen okay I have the right screen there so we'll get started um uh we are one of the one of the oldest uh uh players oldest companies in the space uh we have been uh we started the large language models boot camp we are still the first llm boot camp uh anywhere in the world um but uh as far as our data science boot camp goes we have um a large number of graduates and not just the public uh data science boot camp you have done um tons of trainings Enterprise trainings uh not just uh on data science boot camp I mean some uh trainings that we have are more for managers product managers marketing managers we have done it for across different industry verticals uh fmcg um manufacturing oil and gas so we we have been doing this for quite some time um so a lot of different uh attendees um more than 11,000 graduates from our different programs and um pretty much I mean most likely any uh the company that you represent if you are uh from a company um possibly someone from your company has attended our training uh at some point in time um so what about this boot camp how did the boot camp start um um the boot camp actually started not um we have been a training company but we also have a Consulting and services and now a product side of it we started building in uh llm uh application uh we started as a Services Company when chat GPD became more popular companies started realizing that uh the potential we started building things for our customers and then we very soon we realized well uh if you go and use Chad GPT or Bard or Claud um building a conversational agent might be might look very easy if you are if you don't have a very specific business objective in mind but when it comes to re building a real Enterprise application well it is hard it is incredibly hard when you build it right so um Char GP does not charge you at least if you are on the free tier uh they don't uh they don't charge you anything but it doesn't mean that it comes for free and when you are dealing with your Enterprise data you have to be mindful of uh how much how much money you are spending on these uh applications there are different uh challenges U there's this concept of uh context window well uh how much data can an llm handle at any given time I mean in in a very very lay person's language there are regulatory challenges you cannot actually go deploy a model uh for any domain anywhere that you like I mean it doesn't work that way you will have to you'll have to be mindful of any regulatory uh implications of what you are deploying any wrong answers any leakage of pro proprietary data any any IP U all of that is actually a challenge then you're talking about the latencies right so are you looking at an application that that should be responding in real time or the application is more of a batch processing application uh are you talking are you talking about uh data governance are you uh I mean are you is AI governance your concern or not uh whether you should go for open source or close Source models um then there are challenges around lack of reproducibility hallucinations I think most of us now it's a it's a mainstream term uh um it when your model does not give the answer uh that is factually correct uh we call it hallucination right that just connects the dots somehow and then gives you an answer that is uh semantically coherent but uh factually it is uh not correct um and then how do you deal with outdated knowledge if uh the data is constantly in motion how do you evaluate and how do you handle um you how you you know how to handle a machine learning model a traditional predictive model well a fraud uh you predicted a model predicts the fraud the uh the transaction was actually a fraud or not a fraud so you you can find out well it was you know there are metrics are like accuracy and precision recall how do you evaluate language you know I am I went to Seattle last uh last month or I visited Seattle last month they are uh statements um they are uh semantically very similar but the the the terms or the the keywords that were used they were different how do you evaluate uh the the output of a large language model um so all of these ideas they make uh this whole business actually quite quite challenging uh as I said uh building a simple chat B in chat GPT not a big deal but building an Enterprise application incredibly hard so what we did was while we were building these uh these applications and now we have a product as I said and we have you know uh paying customers for the product um we ended up designing this curriculum uh for um created by practitioners to turn other people or other professionals into practitioners um so we have a fairly comprehensive curriculum I'm going to walk you through the entire curriculum and I'm happy to answer a question why are we teaching this and are you teaching this or not and what's the philosophy um we'll we'll talk about it so this is a 40-hour uh boot camp uh of for one week eight hours a day straight and uh you can attend it remotely or you can attend it in person in Seattle so we have half and half usually uh half of the people are in uh physically in Seattle other people they are um uh distributed globally we have had people actually flying uh from the Middle East as far as Australia we have someone coming flying down from Australia this time uh then we have uh people attending remotely from all over the world and also uh attending in person flying down flying up to Seattle uh so what we do here is uh well the company of curriculum I mentioned this we we cover pretty much the most most of the mainstream tools and libraries um I should I say most the ecosystem is huge and ecosystem is evolving so we try our best we are playing catch up the space is changing very very quickly that means opportunity as well right so um it's a very very interesting space uh fun thing to work on but you know we cover most of the state-ofthe-art uh every time we make some changes in our curriculum and we keep improving Hands-On exercises the the the uh the course is fairly Hands-On course you will see um uh I will show you the labs and how how we actually uh deliver that practical experience and uh and on the last day of the training uh or last day of the boot camp you will be asked to build uh we'll give you boilerplate uh code for an uh llm application and you will be deploying it uh in streamlet on your own of course we are there to help um and these are our partners what we do is we cover um since this is a Hands-On boot camp so any GPU cost any to any open AI or any llm token cost any any GPU clusters I think I mentioned that any deployment related uh cost any serverless functions uh anything related to that uh it is covered by our uh I mean some part of it the cost we bear and some some of the cost is actually um compensated or born by um some of our partners um I I can give you an idea Run Part actually gives us GPU compute um we give everyone a credit uh for the GPU compute when we are fine-tuning a llama 2 model um in class then you will be given uh credit to runpod uh Union is they are basically deployment of serverless uh compute uh so we'll get some credit from them and so on so I cannot go into the details of every single partner I will mention that when as I go through the curriculum and so up to $500 off credit for all software and cloud services you do not have to worry about any service at all right zero cost to you in terms of if you have to deploy a web app or you have to deploy a serverless computer you have to get a GPU cluster or you have to uh you are using um the uh the inference tokens or the cost of inference you don't have to worry about anything we have got that covered for you um so uh our speakers and partners they keep changing I mean we bring the best uh people best practitioners from industry so our roster actually keeps changing uh for uh for instructors I'm one of the instructors by the way so I teach about 30 to 40% of the training myself and then we have other instructors from different companies this time you know we are our partners are actually here uh who are participating in our next boot camp um so we also bring in guest talks from industry um so uh as you spend more time in Industry you realize that uh technology is not all about uh it is not all about um well um software or writing code I mean technology is you also have to understand the business perspective I mean there are um adoption barriers when you build a product cultural barriers uh you know there are other organizational issues so we we have a good mix of you know um while the most of the boot camp is very very technical but there is a product side of it and we talk about the challenges that we faced in our our a product that we are building and as I mentioned we have paying products for the uh paying customers for for the product so what I'm going to do is I'm going to walk you through the curriculum and give you an idea um the overall what are we doing in this uh in this boot camp uh let me actually just go through this one infographic and then I will go to the curriculum in the learning platform so uh if you look at this this is the entire ecosystem are um not the entire ecosystem but a part of the ecos or I would say this is the general idea how you would build an llm application um uh many Technologies would be missing you might say hey why is that tool not there and why is this tool not there so the intent is to actually give you a general idea the intent is not to claim that this is a very complete uh complete um depiction of the ecosystem so when you look at the at the core of an llm application you have this this idea of your llms you can have close Source llms or you can have open source llms you can have um for instance you may want to use a llama 2 or llama 3 or llama 3.1 as which is um and open source which are open source models by uh meta and then you also can be using uh open AI uh gbd4 or gbd4 Omni or uh you know gbd4 Mini so you will need uh some form of some sort of llm uh deployed llms for inference uh many companies are now coming up with their own deployments Azure actually takes it from open Ai and exposes that as Azure open AI um and then um data breaks they have their own version and all of that but we'll we'll give you a very uh good understanding of how to look at the bigger picture so you can take one component out and whatever whatever works for you and your company and your Enterprise you're going to actually plug your own thing in um this is just some of the possible uh possible tools and Technologies you can use um at the heart of any L llm application uh is going to be a vector database and I'm uh um I'm talking about a rag application a retrieval augmented generation type application so it is going to be a vector database vv8 is our partner we have uh this session is jointly taught by data science toan vv8 so we will have this presentation by vv8 we'll talk about uh a lot of things and I will show you in a bit and then we will uh we will then uh you know build a rag uh application later we'll talk about many of the issues and just bear with me and I will uh explain um possibly I mean what we are going to teach in Vector databases right then you have different kind of apis you have there is this concept of semantic cache semantic caching is when uh each inference actually costs us money uh so uh each inference call costs us money so what about can we bypass uh llms and cache some of the I see that I am blurred out here maybe better now okay so um so um you sometimes you just like caching in software your CDN your browser cache server cache client side cach and all of that you need caching in llms basically to reduce latency and also to uh also to save money um so we there is this idea of cemented cach we we'll talk about that uh then um the whole uh area around deployment and monitoring and uh you know Trace uh tracing uh and guardrails um uh think of uh think of this area not much different from machine learning Ops or Dev Ops I mean in devops we used to you know only talk about you know deployment and monitoring of uh you know your software now it became ml ml Ops and now we are also talking about uh llm Ops right so when you talk about llm Ops uh the most notable thing that is uh that becomes very important is G um what if your model uh or what if your llm application uh goes and uh you know says things that it should not be saying or maybe it's a support bot that uh makes an offer uh to a customer that uh that that's not true uh real example Air Canada got recently sued by because they offered a bement fair to a customer uh who was uh going from Vancouver were to Toronto and the Court ruled in favor of uh in favor of U the customer not Air Canada so they could not uh use this as uh uh you know they could not say that hey we did not know uh or it was overbought no I mean it doesn't work that way so we'll talk about the whole idea of guard rails and and everything then um embedding models why do you need embedding models uh you know we'll talk about that um and then uh chain um most notably I mean llama index is there but we focus we spend almost six to eight hours on just Lang chain and Lang chain think of it as um for the lack of a better term I don't know how to characterize Lang chain it's a framework uh it's a python framework also now available in uh some of the Lang chain tools are available in JavaScript as well basically the idea is how do you hook up your llm application to different uh data sources how do you um uh how do you write data in different formats how do you um how do you connect uh how do you use different types of models um how do you build memory uh into your models models are stateless um these llm models they are actually uh stateless how do you bring in memory um and bring in make them somewhat stateful so all of these things we spend quite a bit of time on explaining these ideas and then um I can keep going but the general idea is and in the end I think I should mention this at least that uh we give you boiler plate code um that you can actually uh deploy um you push it into your GitHub and from GitHub to you know streamlet and streamlet why do you we use streamlet because we are we want uh we think that not everyone is a a web developer not not everyone needs to be a web developer right so and uh so even if you're a product manager you should be able if if you if you're are a TPM you should be able to actually you know just finish this exercise maybe when you go back you're not going to code uh yourself but you're going to actually work be working with your llm application developers right so um a common question that is asked if it is not already in the list of questions let me see but one of the most common question is what is what are the prerequisites okay okay um let me see uh I will go here and walk you all through the curriculum this is our learning platform this is our companion this is a companion for the uh uh for the inperson uh boot cap in person includes you know if you're attending remotely uh it's virtual but it is still synchronous and life so we start with uh a complete overview of the entire ecosystem why do you need embeddings and what are embeddings and why do you need a vector database and uh uh very high level nothing very deep yet but we spend the first three hours on Monday on making sure that the breadth is there and the uh you understand the breadth of uh what we are learning and also you are you know um the why part of it why does a vector database exist why does a hybrid search exist what what is semantic search why does semantic search exist why does keyword search exist um then why do guardrails exist what problems are they solving what is prompt engineering what could go wrong in prompt engineering uh what are token uh token limits what is a context window so uh we set the set the stage set the context and you have a fairly uh good understanding of the end to end bigger picture of a large language model application once we are done with that we go uh and we start talking about attention mechanism Transformers and attention mechanism uh so at the at the core of this whole llm Revolution is uh this uh this whole idea of Transformers and attention mechanism so we talk about it in an intuitive fashion uh explain into people you know what does attention mean what is an embedding uh and and somewhat a little more ma uh in uh more detail mathematically and of course uh you know uh algebraically geometrically what do they mean you don't need to be a a rockstar mathematician to understand those things uh we have Lu Luis Sano he teaches this session and he does a phenomenal job actually explaining all of these topics uh then we get into the idea of vector databases and let me actually use it this as a sample session so when I go to Vector when we go to Vector databases we are not just going to just do some hand waving here right so we are going to actually talk in uh extensive detail uh Zen actually teaches this module um and uh if you look at this um there is uh um how are Vector database is different from uh your traditional databases what does a vector data Vector query look like then uh then we actually start talking about in in depth right so uh what does a vector search pipeline look like and what kind of searches can we have we can have Vector searches we can have keyword and uh hybrid search and we can also uh have filtering and why do you need all of those things um once we have done all of this then we talk about how do you organize this data you because you are potentially dealing with um uh you know tens of millions if not hundreds of millions or even billions of embeddings could be there in your vector database how do you actually go and retrieve those embeddings and sub millisecond response time how do you do it um so we talk about we start with your generic approaches but as the indexes become big you have to compress the vectors and all of that we talk about all of that in a lot of detail so this these are the slides that are actually uh covered I mean we discuss these in class uh during The Bootcamp but that's not it once you're done with this well what does it mean to do a vector search or what does does it mean to do a similarity search or hybrid search so you will have access to this these quod samples uh if I click on this you can see that uh this uh there's this notebook that pops up you can see that we give you our you know the keys here you connect to vv8 which is Industries leading Vector database so you create a collection import the data perform the hybrid search and and so on so you go from you know we play around with this uh uh you know instead of an animal if you change it with creature what does it what difference would it do if you replace whale with a dolphin what will it do um so we talk about all of those things uh you can see that there is more we talk about Vector compression uh so we have already talked about it but now we are actually putting this in action and then we will actually go through these exercises step by step so by the end of this session you completely inside out you understand how Vector databases are used um in uh in different kinds of uh uh in different kinds of uh settings so you can see um real exercises uh and one more thing is why are we doing it uh in this manner why don't we give people code um because we uh we want anyone to be able to attend this boot camp many times you you could be a technical product manager you understand you can read code but you are not actually you're Rusty on your coding if you're Rusty on your coding that's okay because the way this will work is we are going to you know run run run and I will show you how to run these notebooks I need to plug in the URLs I cannot run this notebook but you literally run this we have installed all the packages in our uh in our cuberes cluster um all the packages are there uh everything is there you just run run run uh but what if you want to run it locally you're um you're a season developer um can you run locally by all means uh you know go ahead and download these code samples and run them locally if you know how to set up your python virtual environment if you know how to you know how to install all the packages what we try to do is we try not to waste time on my Mac does not actually run this Library what should I do it actually ends up wasting a lot of time because the goal is uh goal here is to learn uh goal here is to learn the the fundamentals of llms not really how to install Python and other libraries on your uh on your uh laptop I mean that's not our the best use of our time once we have done this Vector databases now uh then you're ready now you understand semantic search now you understand rag so we get into the uh what we call the retrieval augmented generation we get into the uh into the the the business of retrieval augmented generation and then we start talking about uh we start talking about the nuances of rag right so what can go wrong what if you have uh you know some chunks that are missing what if you have some extra chunks what uh uh what if how do you break uh your documents down into chunks and all of that right so so we talk about a lot of different um nuances in uh in building a rag pipeline um and as I said right so I I'm personally involved in uh in day-to-day uh on the on our product side we have a product that we have actual paying customers uh for um uh and then you know we have seen many many many of these issues uh firsthand and you will hear uh real anecdotes right it is not just some hey I I I read this paper and now I'm presenting this paper so many of the things are going to be you know like hacks like Engineers right so I mean we made it work like this um so and uh for those of you are coming from a software engineering background uh we discuss I mean how what were the challenges in um in deploying the microservices what kind of mistakes we made and how what kind of re architecture we did in our product um we we actually share the product architecture of course we don't sh share the codebase but we share all the learnings uh on the engineering product engineering side and general uh the product management side of uh of the product so uh uh once again it is not purely about running these notebooks it is really preparing you to build an Enterprise application internally from uh whether you're you could be a startup you could be uh you could be a product manager for a company you can be a Dev manager for a company and now you're tasked with hey go go build an llm application for Enterprise and I'm very confident that by the end of this uh you should be able to do it uh what we cannot promise though is if you are a a good Dev yes you will be able to go and build applications um as a developer if you're a great product manager you will be able to uh go and become a great technical product manager for lmm products but what we cannot do and promise is that you come in with no coding skills and suddenly after the boot camp you are um you cannot expect to start building uh llm applications you cannot start writing code yourself right so you will be uh um whoever you are right now you're a great project manager product manager uh program manager uh great developer great uh Dev manager you will be a great llm product manager llm project manager and so on but we are not going to transform you into uh what you did were not before you came in I hope this makes sense because this is a common question but one of the things is if you do not not know coding if you can read code you will be fine we have had people who did not have a coding background if you just want to understand how llms work uh in general these exercises are very easy to follow uh if you look at this uh you know we are talking about a changing the changing the the name let's try this keyword let's try that keyword so you on the back of your mind you do still understand very well how this works of course you won't be able to write code uh yourself from scratch but you will be able to actually understand as a product manager technical product manager when your devs are telling you that this is happening while you know exactly the challenges that they are running into okay let me go back uh so prompt engineering we'll talk about prompt engineering as well uh we have notebooks actually integrated for pretty much everything that you're seeing in the interest of time of course I cannot go through every single notebook uh but I will occasionally go and get into notebooks uh uh we uh talk about fine-tuning in quite a bit of detail when we talk about fine-tuning um we will talk about uh transfer learning what is fine-tuning um the quantization of models uh low rank adaptation also known as Laura there is this Q Laura quantized Laura um and so on so we talk about all of that and then we give you uh instructions to um get into our runpod clusters we give everyone a credit everyone creates a runpod account and then we give you code uh we walk through a lab that does a fine-tuning of a uh Lama to 7 billion 4bit quantized model 7 billion parameters forbit quantized model we ask you to go and U fine tune the model and then you compare the model uh the UN un fine tune model or not fine tune model with a fine tune model and you discover certain things that are that only can be experienced right so there are certain interesting things for instance no matter how much I told you there's something called catastrophic forgetting that model forgot what it was supposed to originally do but you see certain things in action hey I fine tuned the model why is it acting weird now right so and then you realize fine-tuning is actually not something that is uh as easy as it may appear right uh so we do that and then uh then we have uh and then then also show you how to deploy a fine tune model in that case Union uh we deployed in Union uh server list clusters I had a call with sage who will be presenting this this morning and we were finalizing uh that module so you will have a very practical experience theory side of fine tuning uh then fine tuning the model and deploying it in a in a more serverless fashion uh so and for the those of you who know serverless you know why we are deploying this as a serverless and for those those of you who do not know what serverless is um you will have a fairly good appreciation and understanding of why uh something like this should be deployed in a serverless fashion and this should not actually scare anyone away if you uh I actually think that as long as you have the commitment and the motivation and uh problem solving skills uh most of these things are common sense right so software engineering to me is common sense distributed computing is common sense uh why go serverless why have a containerized application why have VMS why have bare metal all of these they make sense as long as the person communicating them to you actually knows what they are talking about um we have a very very detailed module um on Lang chain when we talk about Lang chain notice this right so uh we will be talking about I wish I could go through all of this but Lang chain is uh you know sometimes you may want to templae your prompts you have a UI that you have built and then you want to change the prompts based on the input from the customer sometimes you know they could be example selectors right so let me see if I have uh let me actually bring up if I can find my open AI key I can maybe maybe run a few Labs here right now okay just bear with me please okay this one I hope this key works no one has reset this so if you look at this so this is an example of a um of a few shot learning you can see that we are using Lang chain for few short learning you will be provided with these keys that I'm plugging in uh so all the Labs uh every lab that we do um you will have um something like this and when I uh I will go ahead and right now um and we'll walk through this in far more detail we are going and learning how to do a few short example you can see that i' I've given it a few example uh pilate belongs in a ship and pilot belongs in a plane and all of that and then you have uh now uh we have uh we are pushing all of these in an in memory Vector database and now I'm running this and um and I think if you look at this a baby is a baby is found in a crib right so let me just change it to a doctor right so a doctor is uh found in a in a clinic or a hospital me see is it still [Music] running okay me see did this run or not okay uh please sorry let me run this again I will have to actually run this and you will you you can see actually how uh you will get firsthand experience it is not some it is not some Theory mambo jumbo right so a doctor belongs in a hospital right a judge belongs in a court and this these may sound like very simple uh simple examples but at the end end of the day uh you know uh we actually build this entire um entire understanding of why do we have an output parser why would you need chains and why would you need a memory and why do you need a multi what why do you need agents and how do you build multi-agent systems right so so uh we recently introduced uh uh something called Lang graph Lang graphs is uh Lang graphs is uh when you have dependencies um you have dependencies between multiple agents multiple agents are actually collaborating with each other and we are using Lang graphs in our own product so so it is not from someone who just happens to know Lang graph from Lang chain website you will be actually learning from people who actually know inside out what is a langra um so that is there uh let me actually go back to the boot camp um we also talk about observability and monitoring of the models uh very important topic how do you log things how many tokens were used how many calls uh how many prompts were toxic prompts how many prompts how many responses were toxic and violated the violated the any any um any Norms or any maybe company guidelines um then we also talk about evaluation in evaluation you can see that we will start with we'll start with your traditional machine translation type uh uh uh machine translation type evaluation uh exercises we start with the um root score and uh and uh uh uh blue score and root score and meteor and then start going more towards the semantic techniques like bir score and then we go to GPD well GPD well is you know one language model evaluating the outputs of the other model and then eventually uh something more formal like ragas which is uh uh which is a a model uh which is a framework an open- Source library that allows you to uh evaluate and assess your rack pipeline um I can keep going right so uh there's a lot that we accomplish in this everything that I said it is practical right so when we talk about ragas there's a practical exercise on ragas when we talk about evaluation there's a practical exercise on evaluation um uh using other techniques so all of these um uh exercises are going to be very very practical and then on the last day of the training we are going to have uh the second half of last day will give you some boilerplate code uh and some exercises so why do we give the boiler plate code because the intent is not to turn you into web web developer we give you very easy to deploy web application and then give we give you exercises how about you do this in this uh how about you do this in uh and change this how do you implement this chain how about you implement this prom template and so on so there are uh things uh uh it's a I think I can take pride in the fact that we have done it in a manner that uh you will leave with actual practical skills uh in building llm application so let me actually I see that there is a lot of questions so I will start with uh okay uh will we get a copy of the this video yes this video is going to be posted on YouTube um on our YouTube channel or and also on LinkedIn if you're joining LinkedIn live um um as a non-programmer that what prerequisite programming tools need to be acquainted to enroll uh all you need to know is uh Python and python even if you are not a python developer uh you never quoted in Python we have a two to three hour tutorial that we'll give you and then as long as you can uh you uh ramp up on some of the Python fundamentals if you can read python code you would be okay but if you have never ever programmed in life um of course you will have difficulty in uh understanding the programming side of it but if you have programmed in any language you will be fine are there any scholarship options available for students and for that it's not possible uh for uh to bear the cost completely from India uh rashab please reach out we would love to help you out uh in any way we can uh but of course to the extent that we can uh would love to actually hear from you are you going to provide extra large monitors during in-person boot camp uh that's an interesting this is an interesting ask Brides We do have because this is happening locally within our office so we can give you a monitor I don't know if the monitors are going to be big enough to your taste but I mean I think I don't know the monitors that we have right now I think they're reasonably big we can actually give you some monitors I mean that should not be a problem okay uh how long will we have access to uh uh to our environment after boot camp will be over uh you will have access to the learning platform for one year after the uh uh after the training uh except the open AI token U those API keys are going to be revoked because we cannot of course I mean they are super expensive we cannot actually give you those but for the entire boot camp that cost is covered um but uh the any credit that you get for GPU clusters that is yours to keep uh but take your time to exhaust it uh the online learning platform the the jupit notebooks that you see all of those are yours for one year right so just go and use them as much as you like uh are you what else is there do we get to write custom code do we get small assignments to build our own llm um I doubt it right so building your own llm I do not think we can do that in a single boot camp that would require probably a much more extensive Boot Camp only on writing llm what we are doing here is how do you use one of the offthe Shelf uh llms open source or close source and build your own application so for the most part you're going to be using existing LMS there's a minor exercise that a small exercise that where you fine-tune an llm but the focus is on teaching you uh how to fine tune n llm not necessarily fine tune n llm for a specific task so I'm not sure if you uh if you appreciate or uh you know it what you're asking is quite daunting actually building your own llm it may not be uh the uh it may not be something that we can teach in a in a uh in a boot Cam that is more introductory in nature uh please talk about the pipeline I think I did talk about the entire pipeline uh if you remember the infographic I did actually General and not quite in the pipeline fashion uh and in detail we definitely um when when we do the boot camp um we talk about uh the pipeline in depth um maybe I should actually I should maybe bring up an idea and then we talk about the uh we talk about the challenges in all stages of the uh rack pipeline uh we focus more on rag because that is going to be um that is going to be most likely what you will end up building uh uh so we will be actually talking about uh the challenges in building rag applications quite a bit and I'm I'm going to flash some of the content on the screen to give you an idea um and then uh to to tell you you know key challenges I'm trying to actually search for my slide deck that talks about uh challenges in Rag and rag challenges um yeah this one yeah so what uh if you look at this uh we talk about the entire pipeline in extensive detail um this is once again a lot of anecdotes lot of personal stories we talk about you know uh what are the possible challenges that you can have what are the challenges that we ran into when we were doing ingestion so we built a pipeline for instance and then we hope that it will scale but the customer decided that they are going to push uh 100,000 uh documents in one go 100,000 Pages rather so and uh when they did that um our pipeline got busted and then we went back and what are the kind of things that were a challenge there then you talk about the challenges in retrieval right so too many chunks too few chunks not relevant chunks uh man junks are to uh um maybe the junks are not big enough um uh query pre- retrieval right so how do you understand intent uh and and all of that right so so we talk about all of these challenges and then we talk about the entire pipeline what are the challenges that you will run into in uh in each of these stages I will slowly um you know slowly move through uh these slides but uh of course in I cannot cover everything in this uh in this session this is something this uh this slide deck actually takes about to to a half hour just to go through this and we do it to usually toward the end um okay uh well uh how to debug the code if at all trapped in error U good old way right so you know just spend time I mean use uh Google stack Overflow and now chat GPT um but what we have done here is for the inclass exercises uh what we have done here is we have U uh the the code samples have been tested out um and uh we make sure that the code uh codee is bug free but then some people they decide to modify the code um I mean I have two chains uh two links in the chain here how about to add one more or another one another one right uh and and so on so we are there to help during the uh during the uh boot camp if you run into issues um then after the boot camp uh you know we have a Discord group for all the entire co-host you can collaborate with other people and we are there to help as well um in case you run into any issues um then how can I get the comprehensive curriculum course uh well you can attend it online if you want to come and talk about the curriculum please feel free to you know set up a call with us and we will help you understand uh slm small language models yes I love it right so whoever this is uh uh yes briefly uh touch upon the idea nothing very extensive uh but the way we do it um the way we do it you will have a very good understanding of you know llms and slms as well uh could you get into more detail about the final assignment will it uh resemble an Enterprise application um no I mean let me let me be very honest right so um uh anything that you do in a training in an academic setting it can never ever resemble uh the Gory and uh you know gory details of a real Enterprise application this is an honest answer I mean I can really get away with just saying hey exactly uh it will look like a real Enterprise application no I mean as a practitioner I I have a responsibility I can tell you no nothing close and you will have uh you know think of this as a counseling session I mean we can tell you everything but I mean you know your own uh challenges right so and real world challenges real problems they are actually that you run into they're going to be in many cases they're going to be quite unique in uh unique to your own uh your own situation so uh but it will be as close as it can get in an academic setting so I I can assure you that right so if is going to be very close to or it can in any academic setting it is as close as is Can it can get but it is not going to be uh exactly like an Enterprise application I would be actually lying if I told you this okay I hope this uh is uh okay what is the option you have for person who's working and can't spend eight hours per day for a week uh um we do not actually at this point uh we have uh we have shorter courses as well we have this llm for everyone uh course but if you want to learn all of this um I don't know if there is a shorter cut I mean because at the end of the week I still feel that I wish we had more time to cover to cover more topics so this this is the bare minimum that you need to get become productive um we actually do keep the recordings I mean if you miss a session um you can uh you can watch uh the recordings later so that's possible but other than that uh you know I think you have to you if you really want to be an llm application developer it would be strongly recommended that you you just allocate a a chunk of time bulk of time and be done with it because uh you know some of the courses on um out there most notably deep learning Ai and all of that they have wonderful courses but the problem is there switch of uh switching of context and you don't know um you know learning has to be done more like storytelling you first you do this then you do this then you do this and then it will start making sense if you do not know how everything is interrelated it is going to be hard um but it will take you a good amount of time if you did it on your own uh and by all means I think you can do it it is just that it might not be as uh easy or manageable okay could you get into more detail about the final assignment okay I think I answered this one okay uh do we have any other questions I think we are done with all the questions and if there are any questions happy to uh answer so then um so I've I've gone through the curriculum overview uh um so there is another question RK uh um well got to know that the session will not make us Pro as developers but on the other side can we ex uh can we be explained the reasoning exactly I mean this is this is precisely the idea of of course I mean this is exactly the idea thank you RK I mean maybe I did not phrase it uh well enough so when we talk about this you will exactly know uh so I think this is the product uh the quantization lab with vv8 so we are exactly going to explain the code hey this is what is happening here and this is what is happening so even if you don't write code when you go back that's okay but you will understand when your when your Dev tells me I use product uh quantization or I'm using hybrid search you will exactly know what hybrid search means or if they uh if they told you um that they're using um uh rag or ragas so what are the metrics that they are using or if they're using GPD well you would know what that means or if they're using embeddings that are you know uh llama they're using um llama is in of for embeddings or maybe some of the open AI models uh the dimensionality of embeddings and all of that so you will be able to understand all of that for sure we'll explain the code the code is going to be explained in uh during the boot camp it is just that um your coding skills um the fundamentally the boot camp is not about improving your coding skills skills it might happen in the general territory of Lang chain uh Lang chain is I think 20 good 20 25 exercises good 20 to 25 notebooks that we spend time on 6 to eight hours on so you will be able to let me actually give you an idea so you can see here so when we are so for instance this example selector we are going to talk about hey we are creating this uh you know this dictionary um and then example selector from examples you can see what does this mean f what is f we'll explain that what does the key equals to do um and and so on so we are going to actually talk about all of that uh but you know we are not going to actually write new derived classes from fot PR template class so so that is the distinction is still it's fairly um you know we'll explain we'll make sure that you understand what we are doing I hope this was uh uh this may s happy to elaborate if needed okay so let me just wrap up uh um so this is uh who we are I forgot to mention actually we brought in new 4J this time NE 4J is going to be um so um knowledge graphs have been around for quite some time uh so this boot camp we are actually bringing a new 4J NE 4J is well for those of you you know I mean uh knowledge graphs are useful NE 4J is a leading player in that space and knowledge graphs actually can help you uh build models that medic that have uh that hallucinate less because of you know the how it helps um so um you know this is the general uh layout where we are let me quickly go this is our partner testimonials these are are the partners uh uh who have collaborated with us and think highly of us this is our customers I mean you can see big names here right so Starbucks is there uh you know Fidelity is there Guardian is there aramco is there Microsoft is there Google is there so we have been teaching people um over the last year we have had people so these are only some of the companies from where we have had attendees attend the boot camp uh the next Seattle boot camp is happening in uh starting next Monday so this is our last info session from the next uh boot camp for the next boot camp and uh you can attend it online sometimes I mean if you are not located in Seattle that is okay you can attend it uh but it is going to be the same you know some people sitting in in our training room here and uh some people attending remotely uh and then we have uh all the audio visual to tools and everything you can hear questions from people in the class and then uh people in the class can hear questions from you I mean the camera Zooms in on the person in the class who's speaking instructor so we have uh you know technology uh as uh to the best of our ability to make it as uh in-person like experience as possible but you you can be in the comfort of your home you don't have to travel uh but you may have to manage time zones of course we have had people who manage this across different time zones I think that's it we are 3 minutes early but um happy to answer any other last minute questions if not we can end this session okay thank you so much everyone and I'm looking forward to seeing some of you at one of the boot camps have a great day everyone

Original Description

🚀 Transform your data strategies with our upcoming Large Language Models Bootcamp! Join us for an engaging information session where we unveil the exciting details of our upcoming 5-day bootcamp (both in-person & online). ➡ What to expect during the information session: • Overview of the bootcamp structure and agenda. • In-depth exploration of the core topics covered. • Insight into hands-on projects and real-world applications. • Meet the expert trainers and learn about their experiences. ➡ Who should attend? Whether you're an AI enthusiast, a tech professional, a creative thinker, or simply someone eager to explore the possibilities of large language models, this event is tailored for you. We look forward to meeting you!

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Science Dojo · Data Science Dojo · 0 of 60

← Previous Next →

Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar

Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar

Data Science Dojo

Data Exploration and Visualization | Beginning Azure ML | Part 3

Data Exploration and Visualization | Beginning Azure ML | Part 3

Data Science Dojo

Reading External Data Sources | Beginning Azure ML | Part 2

Reading External Data Sources | Beginning Azure ML | Part 2

Data Science Dojo

Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1

Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1

Data Science Dojo

Casting Columns & Renaming Columns | Beginning Azure ML | Part 4

Casting Columns & Renaming Columns | Beginning Azure ML | Part 4

Data Science Dojo

Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5

Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5

Data Science Dojo

Feature Engineering & R Script | Beginning Azure ML | Part 6

Feature Engineering & R Script | Beginning Azure ML | Part 6

Data Science Dojo

Building Your First Model | Beginning Azure ML | Part 7

Building Your First Model | Beginning Azure ML | Part 7

Data Science Dojo

Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8

Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8

Data Science Dojo

Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9

Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9

Data Science Dojo

Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10

Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10

Data Science Dojo

Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11

Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11

Data Science Dojo

Twitter Sentiment Analysis | Natural Language Processing | Community Webinar

Twitter Sentiment Analysis | Natural Language Processing | Community Webinar

Data Science Dojo

Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar

Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar

Data Science Dojo

David Wechsler on the Impact of Data Science Bootcamp

David Wechsler on the Impact of Data Science Bootcamp

Data Science Dojo

Andrew Choi on the Impact of Data Science Bootcamp

Andrew Choi on the Impact of Data Science Bootcamp

Data Science Dojo

Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp

Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp

Data Science Dojo

Michael DAndrea on the Impact of Data Science Bootcamp

Michael DAndrea on the Impact of Data Science Bootcamp

Data Science Dojo

Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation

Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation

Data Science Dojo

Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp

Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp

Data Science Dojo

Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation

Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation

Data Science Dojo

Scale R to Big Data with Hadoop & Spark | Community Webinar

Scale R to Big Data with Hadoop & Spark | Community Webinar

Data Science Dojo

Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation

Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation

Data Science Dojo

Ryan DeMartino on the Impact of Data Science Bootcamp

Ryan DeMartino on the Impact of Data Science Bootcamp

Data Science Dojo

Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp

Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp

Data Science Dojo

Wade Wimer on the Impact of Data Science Bootcamp

Wade Wimer on the Impact of Data Science Bootcamp

Data Science Dojo

Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation

Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation

Data Science Dojo

Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation

Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation

Data Science Dojo

Lance Milner on the Impact of Data Science Bootcamp

Lance Milner on the Impact of Data Science Bootcamp

Data Science Dojo

Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp

Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp

Data Science Dojo

Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect

Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect

Data Science Dojo

Michael Atlin on the Impact of Data Science Bootcamp

Michael Atlin on the Impact of Data Science Bootcamp

Data Science Dojo

Amina Tariq's In-Person Experience at Data Science Bootcamp

Amina Tariq's In-Person Experience at Data Science Bootcamp

Data Science Dojo

Ceo's Revelation about Data Science Bootcamp

Ceo's Revelation about Data Science Bootcamp

Data Science Dojo

Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp

Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp

Data Science Dojo

Kevin Hillaker on the Impact of Data Science Bootcamp

Kevin Hillaker on the Impact of Data Science Bootcamp

Data Science Dojo

Marko Topalovic's Experience with Data Science Bootcamp

Marko Topalovic's Experience with Data Science Bootcamp

Data Science Dojo

Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar

Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar

Data Science Dojo

Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp

Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp

Data Science Dojo

Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation

Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation

Data Science Dojo

Vang Xiong on the Impact of Data Science Bootcamp

Vang Xiong on the Impact of Data Science Bootcamp

Data Science Dojo

Data Scientist's Experience at Our Data Science Bootcamp

Data Scientist's Experience at Our Data Science Bootcamp

Data Science Dojo

Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp

Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp

Data Science Dojo

Introduction To Titanic Kaggle Competition | Part 1

Introduction To Titanic Kaggle Competition | Part 1

Data Science Dojo

Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation

Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation

Data Science Dojo

Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him

Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him

Data Science Dojo

How To Do Titanic Kaggle Competition in R | Part 3.1

How To Do Titanic Kaggle Competition in R | Part 3.1

Data Science Dojo

How to do the Titanic Kaggle competition in R | Part 3.1

How to do the Titanic Kaggle competition in R | Part 3.1

Data Science Dojo

Delve Deeper into Data Science with Data Science Bootcamp

Delve Deeper into Data Science with Data Science Bootcamp

Data Science Dojo

Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp

Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp

Data Science Dojo

Shaena Montanari on the Impact of Data Science Bootcamp

Shaena Montanari on the Impact of Data Science Bootcamp

Data Science Dojo

Types of Sampling | Introduction to Data Mining | Part 12

Types of Sampling | Introduction to Data Mining | Part 12

Data Science Dojo

Sampling for Data Selection | Introduction to Data Mining | Part 11

Sampling for Data Selection | Introduction to Data Mining | Part 11

Data Science Dojo

Data Aggregation | Introduction to Data Mining | Part 10

Data Aggregation | Introduction to Data Mining | Part 10

Data Science Dojo

Data Cleaning | Introduction to Data Mining | Part 9

Data Cleaning | Introduction to Data Mining | Part 9

Data Science Dojo

Missing & Duplicated Data | Introduction to Data Mining | Part 8

Missing & Duplicated Data | Introduction to Data Mining | Part 8

Data Science Dojo

Data Noise | Introduction to Data Mining | Part 7

Data Noise | Introduction to Data Mining | Part 7

Data Science Dojo

Graph and Ordered Data | Introduction to Data Mining | Part 5

Graph and Ordered Data | Introduction to Data Mining | Part 5

Data Science Dojo

Document Data & Transaction Data | Introduction to Data Mining | Part 4

Document Data & Transaction Data | Introduction to Data Mining | Part 4

Data Science Dojo

Data Quality | Introduction to Data Mining | Part 6

Data Quality | Introduction to Data Mining | Part 6

Data Science Dojo

This video provides an introduction to the Large Language Models Bootcamp, covering the basics of LLMs, prompt crafting, fine-tuning, and retrieval augmented generation, with a focus on practical skills and hands-on experience.

Key Takeaways

Build an LLM application
Deploy an LLM model
Fine-tune an LLM
Craft effective prompts
Apply Retrieval Augmented Generation to LLM applications

💡 Retrieval Augmented Generation is a key concept in LLM applications, allowing for more accurate and efficient generation of text.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

The 2026 AI Model Release Race: Every Major LLM Launch You Need to Know

Stay updated on the 2026 AI model release race, including major LLM launches like Claude Sonnet 5 and GPT-5.6, to leverage the latest advancements in AI technology

Call GPT, Claude, and Gemini from one API key — a 3-step setup

Access GPT, Claude, and Gemini through one API key with a 3-step setup using Modelishub

Your LLM Doesn’t Pick Stocks — It Remembers Them

Discover how LLMs remember stock picks rather than making actual predictions, and why this matters for AI-driven investment strategies

Medium · Machine Learning

Word Representation

Learn how word representation works in NLP and its importance in understanding human language, enabling applications like text classification and language translation

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)