Large Language Models Bootcamp Information Session

Data Science Dojo · Intermediate ·🧠 Large Language Models ·1y ago

Skills: LLM Foundations90%Prompt Craft80%Fine-tuning LLMs80%LLM Engineering70%

Key Takeaways

The Large Language Models Bootcamp covers foundational and advanced topics in LLMs, including embeddings, transformers, attention mechanisms, vector databases, and RAG applications, with a focus on hands-on learning and real-world applications using tools like Python, langchain, and Azure.

Full Transcript

So um thanks uh um and welcome to the information session for our large language models boot camp and a recently introduced uh uh boot camp on H&TKI. Um it's a large language models boot camp. I will explain it uh in a moment. uh light language models boot camp is more uh a more detailed and longer term uh or longer duration boot camp. Agent boot camp is for somewhat I wouldn't call it um advanced but people who are slightly familiar with the idea or the the concept of generative AI they can actually come in and learn to build agentic AI and uh agents and agentic AI workflows. So I will walk through both of both of them in just a bit. Uh my name is Rajbal. I'm the chief data scientist and a lead instructor at data science dojo. Uh and I've been doing this for uh a long time more than half of my life I've I've spent at doing AI um machine learning um analytics data science and so on. Um we're one of the oldest uh players in this space. Uh we have been u uh teaching um uh through our public boot camps uh our enterprise trainings uh more than 11,000 graduates uh from our boot camp perhaps more than any other boot camp uh in the world um um with u and when we talk about a boot camp and we are not talking about 2 hour 3h hour trainings we are talking about you know longer duration comprehensive trainings and uh we have a big footprint globally. So um let's let's look at why why do you need to learn um learn to build these applications. So a common question is well I can go and build an agent in copilot uh or I can go and build uh you know there are other um um other possibilities you can build agents in perplexity chart GPT glean and and the list goes on and pretty much uh you know it's it's a very crowded space uh the problem with many of these tools uh you know when you go past uh their sales team. Uh when you actually start onboarding, suddenly you have this uh this uh better realization that uh the the AI that was promised to you is not the same AI that was delivered to you. Uh what I mean there is that hallucinations uh are still going to be a problem. the brittleleness of prompts this you ask the same question uh in a slightly different manner the answer is going to be different. Um then uh when you talk about uh you know data governance uh do you want to build things in cloud in their cloud and if you upload their your data in their cloud what is the guarantee that your data is secure right so and maybe uh even if they give you guarantee perhaps uh the industry that you are in or the regulatory um uh challenges they prevent you from uh doing things in a certain way. How do you scale right? So uh and there are uh so many more uh problems that you can run into. And if you build a solution like this on prem, you decide to or or not only on prem, but if you decide to build a solution within your own cloud, um how do you deal with all of these challenges? Even if with the most technical team, there is a lot of lot of ideas, lot of uh lot of concepts to learn before you even start building something serious enough. Um, you know, sometimes you hear about this this idea, hey, I mean, I can just clone this git repository and uh build a rag application on uh on a single PDF file. And we are not talking about a single PDF file. We are talking about um a PDF file, a JSON file, a Python piece of code, uh you know, a CSV, an Excel file, a Word doc, or a PowerPoint. And we're not talking about one or two of them. We are talking about possibly uh thousands or tens of thousands or maybe hundreds of thousands of those files. And we are not talking about all of them located in a single Dropbox or SharePoint or Google Drive. We're talking about uh all of them uh located I mean a serious uh company they would have their data in different places. I mean uh located in all over the file formats can be different, the locations can be different. um you know the the permissions could be different and also uh you sometimes don't rely on these static files sometimes you're relying on APIs for your data there are so many variations um of all of this uh data ingestion um and uh what kind of models you can use what kind of prompts you can use all of that actually makes building uh large language model applications quite challenging And what we do in uh both of our boot camps is uh we give you a very detailed introduction to uh detailed introduction to pretty much everything that you need. In our uh in our large language models boot camp uh we take a uh a more uh approach that starts from the basics. Um so we start with the embeddings and transformers and attention mechanism. So let's let's go with LLM boot camp and then I will tell you where we actually differentiate in our in the uh in the agent AI boot camp. So in the large language models boot camp we do not assume any background. So you can be really anyone right? So no machine learning background needed really no um uh no not a very strong coding background needed um you know if you if you can basically understand read Python code and I will show you uh you know how we have set things up you can actually attend the boot camp and do fairly well and then uh so we start with the whole idea the notion of embeddings and transformers and attention mechanism we we go into deep inside vector databases. We get inside u langchain and aentic workflows. Then we get into observability and monitoring. Um then we talk about guardrails. Uh you know how do you keep your application safe? I mean can someone come in and uh get my LLM application to diverge any details that it should not be giving to the user? Can someone uh what we call the prompt jacking and uh prompt hijacking and jailbreaking type problems. Uh we spend quite a lot of time in evaluation right? So evaluation is a very interesting area uh in a in classic machine learning when we talk about uh fraud detection. So well you have baseline data sets um a transaction is a fraud or not a fraud and if your prediction is wrong well uh you know you can know that your prediction is right or wrong but in language the same thing can be said in different manners. So how do we evaluate uh how do we evaluate our uh LLM application such that uh we know that it is performing well. How do you set up for those of you who are coming from a software engineering background? How do you set up uh an evaluation pipeline so you can test your changes uh or in your application for regression testing like this is call it the an equivalent of regression testing. So we uh worry about deployments and ops aspect of it. Uh we get into the fine-tuning business as well. Um we we talk about um all and some of the product management challenges and technical debt. Um the course actually is taught by people who work in industry and they are building these things. Uh we take pride in the fact that we are not a company that just you know you gather some material on transformers, some material on vector databases from this YouTube lecture, this uh this uh uh paper. All of our courses are practitioner uh courses um created by practitioners for practitioners. So if you want to be a practitioner, you don't have to be hardcore um developer. uh you can be a technical product manager, you can be a you could be a founder no matter who you are. Uh so we are if you are interested in large language models and building large model applications, we will actually uh enable you. Um so very uh high level um um uh difference between the agentic uh agentic AI and the large language models boot camp. Um the LLM boot camp it is uh uh it is both in-class and instructorled. Some people attend it in our Seattle location in our at data science dojo. Some people decide hey we cannot take uh the entire week off. It is it's a 5day 5day 40hour uh boot camp. So they decide that uh no we cannot actually uh take an entire week off. I mean we so we have people who have attended as far as from Australia um uh Europe, UK, uh Spain, Italy, uh South America. So so we we have people who attend it remotely from all over um but then there is uh there are um there are people who attend um who wanted to attend a shorter duration. Hey, we already know about transformers. We already already know how to build vector databases uh or we we already know about vector databases. So this is a boot camp that focuses strictly on the agents and agentic workflows. So we do cover agents and agentic workflows even in the large language models boot camp but we uh we get into the somewhat more advanced design patterns in Aentki uh in our AentKI boot camp. And this is a more flexible schedule. This is uh every Wednesday I believe uh the next uh one the next cohort that is coming up. So every Wednesday uh we will have a class uh 3hour uh class we will go for eight eight weeks there's going to be homeworks and assignments and all of that. So this is a shorter duration or maybe less of a time commitment course than uh than the uh large language models boot camp. Let me see here we have and please feel free to ask any questions uh if you are on our zoom call please uh ask any questions uh there if you are uh attending one of the live streams on uh LinkedIn, Facebook or Twitter or anywhere else um ask a question and then it will be directed to me. Um at a very high level when you build an application um I I I like to actually look at an application uh in in a manner that uh you know if if you look at it so there is at the core of an LLM application is your AI generative AI or what we call the LLM model large language model um the large language models can be open source or closed source large language models can be onrem, they can be in your VPC or you can you could be actually leveraging them as a as a service. uh when we talk about open AAI open AAI is a closed source model you can get it from directly from open AI uh deploy or literally I mean GPT 4.1 just go and get uh API key and then you are using that shared deployment um given to you by uh open AI then there is another uh deployment paradigm which is uh deploying it in your own BP PPC. So u you know you could deploy an open AI model in uh your own Azure cloud and I I believe Enthropic is a closed source model but it is available in the Amazon Bedrock ecosystem. So or or you can decide to actually deploy it uh on your own uh on your own um on-prem laptop or or your desktop or your server. I mean depending upon how big the model is and all of that. In addition to that uh most of the applications are going to be rag applications commonly known as rag but retrieval augmented generation applications. So you will probably need a vector database. It's a it's a crowded market once again. Uh you will need an embedding model to create embeddings and store in that uh vector database. You will need some sort of data pipeline that actually processes your data and then puts it in the meta database. Then you will need some sort of API connectors and all of that uh you know some uh some services. In addition to that you have you will you might need uh large language models or as cache or what we commonly known as semantic cache. Then you have orchestration uh in orchestration tools like lang chain and llama index. uh you will need some sort of logging and uh LLM ops uh understanding how long a prompt is taking, how long how many tokens were consumed, how long it took for uh it took for it to reason and so on. So um there are standard tools and frameworks there are available you will need some guardrails. So it's a it's a fairly complex ecosystem and yes you can actually learn you can deploy uh if you you know any middle schooler can actually who knows Python can go and build a rag application but uh even the more somewhat advanced developers they need to do a lot of work before they can build an application that can be deployed in uh uh for practical uh use um while it keeps the company secure um uh meets all the uh regulatory requirements, your data governance needs are met, your scalability needs are met uh and so on, user experience is met. That is actually a non-trivial endeavor. And what we teach in these two boot camps is how to build these systems from scratch. That's uh that's the highle idea. Um uh let me actually go here. uh maybe a few more slides and I will show you our learning platform and what we have. So if you look at this u the technology stack that we use is uh it's quite diverse right so we primarily use open AI GPD 40 GPT 4 uh 4 uh.1 but we also touch upon a little bit on the llama series of models so in many cases we use the open AI um endpoints uh sometimes directly in uh open AAI sometimes uh you know the Azure deployed version. We also actually talk about fine-tuning the model and we uh and when we talk about fine-tuning, we just simply don't handwave in terms of presentations. Hey, this is what fine-tuning is. We actually talk about the theory of fine-tuning uh you know talk about fine-tuning, distillation, um transfer learning, uh you know, quantization, low rank adaptation, Qura. And once that theoretical foundation is set up, we actually go and uh give you a GPU cluster um give you uh that access to a GPU cluster where you go and deploy a fine-tuned model um in um let me take it back. So we we you start with the llama 2 4-bit quantized 7 billion parameter model and then you go ahead and fine-tune the model and see compare how the fine-tuned model is different from uh the model that is that was not fine-tuned a lot of interesting learning that happens and then we also talk about deployment uh in a in a different session uh we talk in detail about uh vv so uh the vector databases and VV8 is our partner in that uh and I will show you maybe uh one of these examples So you can see a lot of learning lot of uh lot of tools and then the boot camp is streamlined in a manner that we don't fiddle with tools too much. We have streamlined the operations that we like sometimes people say 40 hours is it enough? Uh yeah it is it is never enough. I mean learning is never enough but um the the 40 hours uh that we have uh uh that is going to be actually um you know we will have a lot of lot of learning in those 40 hours because we do not waste time in uh you know figuring things out and um uh dep uh deploying this thing here and you know um uh debugging this thing there. We actually have things set up already. So if you look at this um uh once you register uh you will be given uh access to you will be given access to uh this learning platform you can see that um all the content is nicely neatly organized. Uh we start with a highle introduction to the bigger architecture uh you know how do you build these applications then we get into the details of uh transformers and attention mechanism. We get inside that all the details that you would might like to know. Uh you know starting from the the basic intuition of uh you know what is deep learning then all the way taking to the attention is all you need and transformers and encoder decoder uh architectures. Um then uh you know prompt engineering I will I will give a very high level overview and then maybe drill down on one of the a few of the these modules. Then we go inside the uh you know there's this thing uh a practical introduction to uh vector databases. Uh um we spend almost uh um five to six hours on just learning about vector databases because very likely when you go back uh you will be building a rag application. Excuse me. You'll be building a rag application and understanding vector databases insides out is actually incredibly important. We spend almost eight hours and more than an entire day um almost a day on langchain. Uh and I will drill down into this as well why langchain is important. Uh I will uh uh I will mention this in a bit. We talk about the challenges in building rag applications. We get inside the LLM observability and guardrails. We talk in a lot of detail about fine-tuning of models and evaluation. And then on the last day of the boot camp, everyone actually builds their own LLM application. So let me actually go inside and show you uh what would a single module look like. So when I click on this, I'm inside this module, a practical introduction to vector databases. Once I go inside um let me go here. This is uh we are talking about this introduction to vector databases. So if you look at this we are uh when we talk about vector databases we talk about actually a lot of uh things in detail. Uh we start with the high level what is a vector database? Why do you need a vector database? Um and then um you know how how is it different or similar to a your traditional vector database? We build it slowly. We build it very slowly, build the momentum and go to then very very um detailed nuance topics like uh uh improving the retrieval, right? So uh how does uh how do different indexing uh techniques work? How do we perform a vector search? What is a keyword search? what is a semantic search, what is a hybrid search and how do you bring in different kind of faceting and filtering like features from traditional BI. Um if you look at this here uh you know lot of detail uh you can see that uh you know we are explaining things in a lot of detail uh going uh step by step explaining how does a vector search happen in a in a vector database. So if you keep going uh here u of course I cannot go to every single slide here but you can see that we are talking about quite a lot of things. Once we have covered all of these then we go to these exercises and these exercises are actually aligned with the topics that we covered just now. So when we talked about vector search hey I know I looked at those slides uh so but I I I don't get it. Well, uh when you when we go and uh look at these, I will uh click on this and I will click here and when I click here, if you look at it now, I'm launching this uh vector vector search notebook, you will be given uh you know a deployed um endpoint. Uh you will get be given API keys. uh uh we'll show you how to set up a vector database and then after that we'll show you how to actually query a vector database and then we go step by step uh we explain hey this uh this is doing this and this uh this step is doing this and so on and uh this is uh and all of you you get your own dedicated compute and all own dedicated storage in our learning platform. So there is uh so we are not asked I mean you if you want to I mean you can download the code samples of course but um you do not have to set anything up right so what we want to focus on uh on one and only one thing which is learning because many times I mean we have been doing this for long enough to know your IT does not allow this you don't have admin privileges on um to install certain packages or certain software on your laptop uh you know hey um uh I ran into this my Mac would not install this particular uh Python package. There is a different dependency that we need to do right in this case all of your coding uh all of your compute and storage it is stored in our cloud. All you have to do is just come run run. If you want to create a new code sample, if you want to practice uh you know you can launch a new notebook, just do whatever you need to. And this is included in uh for one year uh in your subscription, right? So if uh when you sign up for the boot camp, you get one year access to all of these compute resources. You can anytime come back and uh play around and um you know uh use it in the sandbox. Um um there is a question what what if I would like to check in my projects work under my GitHub account instead we we won't uh we wouldn't stop you from uh or if this is a question from you uh you can you can check it in anywhere you like I mean it doesn't doesn't matter really uh and actually as a matter of fact the final project that we show you we ask you to push it into your GitHub because eventually we uh push it into streamllet and then the process actually becomes quite streamlined Uh so it is completely fine if you use um you know um uh use everything uh uh and push everything in your uh own GitHub. Um I think it it it should be completely fine except that you cannot actually uh push our proprietary material into uh into your GitHub account and you know monetize on it. So other than that I mean we are cool with it. I mean we I mean we enjoy learning, we enjoy teaching as long as there is you know fair play it is completely fine for us. I mean we don't mind how you use the code. So now if you look at this now um you know I can go to a generative search example right so we go by okay vector search similarity search hybrid search generative search multi-tenency what and we have already talked about these ideas these concepts in our session and now we are coming in and talking about these in practice. So we spend a lot of time actually explaining all of these ideas. If you look at this now what is generative search? Well, generative search is think of this as a poor person's rag. I mean, this is the the most basic rag application that you can build. In this case, what we show is how can you uh how can you actually uh create uh a vector database? Um uh how can you create a vector database? How can you after creating the vector database, how can you push uh uh some data into that vector database or create a collection? uh how do you configure uh a specific model in this case you can see that we're configuring an open AAI embedding model but what if you want to configure a coher model and all of that so we'll talk about all of those details um and then sometimes on the fly we will say okay what if I wanted to use a different model instead of GBD4 or mini what if I wanted to use GBD4.1 so we actually go through this if you look at this uh these are some ideas uh I mean definitely uh we can learn um in theory but over here you can see that the the the properties that I'm creating uh some are vectorized some are not vectorized I'm skipping vectorization what are the implications of that so we go slowly build the momentum try different queries uh and really uh understand things inside out really the goal is to not have some superficial knowledge to carry about uh uh you know just mention here what is rag you actually can uh so when someone is talking about uh you know well I'm doing generative search and uh you know I was doing hybrid search and the alpha value was this you exactly understand what they're talking about right so we are actually will enable you and empower you that you uh are um you are you are fairly well informed in the in uh in in this whole application building. I intentionally um don't like calling it an expert because I know from a marketing standpoint telling someone hey you will be an expert and you would be a guru in LLM application but the problem is um uh you no one becomes an expert uh by attending a single course right so um you you don't become an expert by attending a single course no matter even 100 hours of learning is not enough it is your own endeavor your own undertaking that you will go back and spend time on it and then be on that road to expertise. But we teach you uh this is by far I think the only boot camp only 40hour boot camp as far as I know. I mean we were the first one. I'm I'm sure there are more that will pop up eventually but it is definitely the most comprehensive one. It is definitely a boot camp where you know you will be taught by people who build things. They are not just instructors. Uh okay. So uh lang chain I will show you what we do in lang chain um just maybe a few uh different labs and then uh then we will move on. Um so if you look at this um think of lang chain as uh think of lang chain as more of a more of a framework that helps you that helps you with the plumbing of your LLM application. Right? So your model um um just getting a large language model uh alone you would be actually shocked if you directly called a large language model and if you asked it anything uh you would be shocked actually that without context how clueless these models are. They understand the language. they they do have some information of historical facts and all of that but without proper context um let me give an example who is the president of United States right if I go and ask this application chat GPT chat GPT is an application built on uh large language models GPD 4.140 uh it will know that I'm talking about today uh in this time frame they will say Donald J. Trump is the president. However, um if I ask directly a large language model, if I asked it who's the president, it will uh possibly it will tell you as of Sep depending upon which model it is. I think GPT4 will tell you as of September or October 2023 um um Joe Biden is the president, right? So, and then uh well uh what is the problem? The problem is context, right? So to build that context, you need to be able to plug in uh you know create variable templates uh in uh in these models. Uh you should be able to get um hook uh these models to different data sources. You should know the different uh different kind of uh you know document loaders and you know text splitters and uh document splitters uh uh different kind of uh you know if you want to connect it to a vector database a different kind of vector database or you want to connect it to a shareepoint or you want to connect it to a Dropbox or Google Drive or Salesforce structured unstructured data different types of data sources um lang chain actually is uh your tool of choice for that. LChain is not the only tool of choice. There are other tools out there but langu chain and llama index they are by far the most popular ones. So we get into this how do you uh chain your prompts your LLM function calls? How do you chain them? Then uh how do you um how do you build memory? Um I'm carrying out a conversation. Um as I'm speaking with you uh for those of you who uh started the session um you who joined the session from the beginning you have some context you have some memory of our conversation but what if someone joins right now abruptly do they have the context or not so um lang chain has this context building tools I mean memory is one of them uh we'll talk about that uh as well uh introduction to agents. Of course, we'll talk about introduction to agents. Let me just randomly click on one of them. All of these labs uh we do in class. So, lang just lang chain has possibly 16 to 20 code samples that we actually do in class. Fairly intense. Uh intense not in the sense that it is going to be very difficult to understand. uh intense in the sense that uh it uh is uh it is a lot of material but we spend time we make sure that all the questions are answered. Uh so if you look at this uh for instance this is an example for uh agents and tools. If you look at this uh I am going to in this case let me actually go back to a search tool that you can probably relate to. So if you look at this this is another example. So in this case I'm setting up see I'm setting up a web search tool here. So basically uh when I ask a question it will uh if if needed it will go and search the uh it will go and search uh um the uh search the web and then bring me uh back the answer and inter interpret uh the answer in the context of uh the web search um and there are many cases right so if you look at this uh you know who's the current prime minister or um who's the president of United States and all of that So those kind of things which are related to present we'll talk about those. Uh then uh we'll talk about multi-agent collaboration. We'll talk about uh rag agents with langraph. Uh let me show you this langraph is um this kind of u uh think of lang graph as a as a as a tool that helps you uh do the state management. uh in many cases uh when we uh when we have a query that cannot be answered um in a in a single go uh we um so when uh when the person asks a question that has a dependency on multiple LLM calls lang graph actually is a way to build those dependencies first I will do this then I will go and do this and then I will do go and do this and that kind of dependency graph we actually talk about this uh in a lot of detail. Uh so we get into the details of this uh you look at this um you know we start with a function call uh go and there's tool node uh query rewriting node and all of that and the generation node. So a lot of detail uh of course I cannot cover all of that in this uh limited in the limited time that we have. Uh very very detailed um um introduction to um lang chain and langraph. Uh then uh we talk about challenges in building rag applications. uh you know it may look like that building a rag application may be fairly straightforward but when you when you start building those applications you know scalability, security, privacy, you know prompt u you know prompt brittleleness all of those they become actually a lot of challenges um um we spend quite a bit of time on LLM observability and setting up the right guard rails uh fine-tuning I think I mentioned earlier uh we go into a lot of details evaluation. Let me actually go and uh I wish we had the time to actually get into all of uh all of those uh sessions. So I will just randomly click on a few of them and then to give you an idea. So when we talk about uh when we talk about evaluation, let me just bring this up. So if you look at this, we are talking about uh evaluation. So we start with we set the we set the tone by actually breaking it down. You have some tasks, you have some data sets, you have some metrics, right? So you have so we talk about all of them. Uh we bring in all of them. Um ex set the context and we talk about you know what kind of tasks are possible. We talk about what kind of uh data sets are available for each kind of task. uh um and then uh and then we go on to actually learn how do you how do you actually evaluate uh different techniques all the way leading to um evaluation of uh of your rack pipeline. Um similarly you know once we are done in every session that we do we have uh every session that we um complete we have uh end of session exercises in this case evaluation using raggas and then when I go here and uh I go here and talk about evaluation using raggas. So if you look at this uh you know we um we set up a rack pipeline and then we set up our uh rack chain. What are the questions in the ground truth? How do you actually evaluate the model performance? There are actually key metrics that we talk about you know faithfulness, relevancy, context recall and so on. And by this time you already understand um um what you already understand u from the slide deck and now you're looking at the uh the outcome of this. Um finally uh on the last day of boot camp we actually build and uh build a very uh uh build um an end toend uh LLM application. we built that application. Uh we give you some exercises. Hey, why don't you add this chain in this? Why don't you add a search agent? Uh how can you add memory uh to your application? So we have these uh uh these application we give you boilerplate code where you can start uh boilerplate uh um the boilerplate um uh web application. So because not everyone is a web developer and the uh we give you a VM uh all the code installed in the VM. So um you connect it to your own GitHub repository. We show you how to deploy it. So the the goal is to actually enable you, inspire you, empower you and then after that beyond that you should be able to uh you should be able to actually deploy things on your uh at least you can get started depending upon you know one thing that uh we never we have never done at data science dojo is we don't make uh you know we are not in the business of selling snake oil here right so you know sometimes uh you know pretty much most business is hey become a data scientist hey I don't even know you I mean so you are I mean so maybe you don't even know how to code py python so we are not in uh the business of as I call it I mean selling snake oil and uh we are not in the business of you know this metamorphosis of someone magically transforming into a data scientist or an a rockstar LLM applications engineer you're going to be if you're a product manager you will be a great LLM application product manager if you're a great dev, you will be a great LLM application developer. So the goal here is to make you a better um professional in whatever you do. If you if you're leading a team, I mean we have had people uh from all across the spectrum. I mean people manager, dev manager, they have attended consult consultants, they have attended the the boot camp and they have done very well um uh in their uh when they went back I mean they have built applications but we are not here to fix you know I mean someone shows up I mean 30-year-old person shows up and suddenly I mean we cannot compensate for their other if they don't know data structures if they don't are not good in their cloud concepts and uh we are not going to show them how to and fix their fundamentals right we but definitely the fundamentals of generative AI fundamentals of vector databases fine-tuning you know lang chain etc they will be solid but of course uh you know that uh that has to be taken in context right so you know that's uh that's my take as a as an honest educator right so uh we are educators first uh before anything else uh Let me see here. Uh what else do we have? Uh for the LLM boot camp, the prerequisites are I mean as long as you have working knowledge of Python, if you can read Python code, um most people they get by. But for the Agentic AI boot camp, we expect you actually are an a reasonable coder. I mean not a rockstar coder. And then for the agent AI boot camp, we also expect that you understand some of the fundamentals of transformers and you know the basic what is an embedding, why do you need an embedding? Because in that uh in the uh in the agentic AI boot camp, we focus on more on the agentic side of it and more how do you how do you accomplish certain things using the LLMs? Uh LLM boot camp is more foundational. Yes, it does cover some of the agentic behaviors, but uh Agentic AI boot camp is is a more uh I mean I don't want to mislead by calling it too advanced. Uh I think anyone who knows a bit of coding and understands how LLMs work can actually get by. Uh we do give remedial remedial material for both of them. If someone wants to undertake the agent boot camp straight away, I mean come talk to us. I mean we can actually recommend and uh we take these uh our calls uh I mean we don't call them our sales call we call them advisor calls right so I mean just come in we'll coach you guide you what what you should do but uh the benefit in aenti boot camp is that you can uh attend it on your own schedule you don't have to travel it is purely uh online is still live instructor-led I mean just like the zoom call we will be here uh you know interacting the same experience uh but it is spread over eight weeks as opposed to um as opposed to a uh you know our LLM boot camp which is only uh focused on in a single week uh and uh people are different hey I want to be in and out in one week uh I don't I don't think I can maintain focus for eight weeks um so I like in person interaction. I want to be onsite. It really is your judgment call. We are offering both. Uh I think I have gone through this uh these slides already. Okay, let me see where are we? Yeah, and I think uh I I did mention this, but if it wasn't entirely clear, um all you need is sometimes, hey, my what laptop do I need? Don't worry about it, right? Do I need GPUs? Don't worry about it, right? Do I need uh uh subscriptions? Don't worry about it, right? Just bring your laptop. As long as your you have a laptop that has a web browser, we'll take care of everything because we do not want to waste time on setting things up. Hey, my credit card is not working. This subscription, that subscription, we'll give you the we cover all the token costs during the boot camp. The Jupyter notebooks, everyone gets a compute and storage. Uh then uh when we do the fine-tuning exercises, we give you the uh we cover the cost of the GPU clusters, we give you, you know, you get um Nvidia uh GPUs in the cloud run pod uh cloud. So we cover the costs of all of this uh as part of your boot camp registration fees. So we cover um uh all the compute and all the subscriptions and all the infrastructure needs for the boot camp. Also uh we have a relationship with the academic partnership uh with the University of New Mexico continuing education and a consequence of that is that many companies they cover our boot camp as part of their tuition reimbursement policy. uh I remember uh last boot camp we have had people from Salesforce, Boeing and Apple but there are many many other uh there are many other companies that actually cover uh these trainings as long as they are coming from an accredited institution the uh uh uh requirement that we meet um you uh you can simply request your HR uh to cover the cost of this and this is like your health insurance or your other benefits that your company may have. Many people don't know about these benefits at their company. So, you may want to check. I mean, we were surprised actually many companies they have, you know, four, five uh up to 10 grand uh yearly training budget uh that is part of their benefits program. So, um and we are we meet that criteria. So, just let us know and then we would be more than happy to help out. and some companies that I mentioned. I mean we have had people who have already attended. Uh we have a solid stellar line of instructors u uh from all over practitioners who have uh who have attended uh our uh who have um some of them actually frequently show up. I mean um and some of them I mean depending upon their availability we invite them for talk and all of that. Uh so we we try to uh create a very good mix of uh high quality um uh thought leaders from industry who come in and talk at the boot camp. Um we have this built this interesting ecosystem where we actually subsidize uh the infrastructure costs in the form of credits through our uh partners. Let me also you know we are actually I mean this is a very small list. I mean we have I don't know more than 3,500 companies have attended our uh training uh and uh you know and you name it. I mean pretty much uh any any company that matters on the planet someone from that company has attended our trainings. Um hear from our partners I mean they love uh our uh trainings. uh uh we have had people from big companies. I mean uh uh there's a long big lineup of people who have attended our um our generative AI boot camp. I mean go check out I mean we have real people and go and ask them right so I mean sometimes I mean we don't have any John Doe's and Jane Doe's on our uh on our website. You have uh thousands of people who have actually attended and we have I mean they uh they loved us love us enough that they have given us permission to put their name uh on our website and our promotional material that uh they are our champion. So go check it out. I mean many people likely many of you uh which wherever you are whichever company you are from uh someone from that company has actually already attended our training. Just reach out to us and we'll let you know. I mean we are um not the big names venture funded uh you know big companies and big uh uh operations u um that I mean big uh very seasoned salespeople who would just make sure that they close the deal. We are actually very invested in our attendees our learners and we want to make sure that everyone gets value out of it. Uh and I think uh that should be it. Our next boot camp uh the the LLM boot camp is happening in Seattle from June 9th to 13th. This is the 5-day boot camp and I'm sorry I think this uh we recently announced the first cohort our of our agent AI boot camp. Let me actually go here now. Just go. I'm logged in as an admin here. Just taking forever to load. Okay. So, let me go here in our boot camps. Let me go to the Aentic AI boot camp. So, the Aentic AI boot camp, I believe we are starting in July. I will have to check with my team. But in the Aentic AI boot camp, uh we have a slightly smaller uh shorter list of instructors. and uh uh we will talk about all the all the context building multi-agent u applications there and then uh the next boot camp is starting on July 9th classes every Wednesday 3 hours a week so that uh that is uh that is it right so uh I'm happy to answer any question that anyone may have Uh I will give it a few more moments if anyone has any questions. So RF I um so I see that you're saying that you attended your one uh week inerson boot camp and was one of the best experience. So thank you so much for your kind words RF. uh and then uh uh if if the LLM boot camp actually makes sense to you, I can assure you that I mean we are upholding the same standards uh that we maintained and maybe we are a bit better as a as a more mature company. So uh you know please feel free to get on a call happy to happy to talk to you. And uh let me see if there are any other questions here. Um so there is a question here would you please share the adoption of geni models clot to gro inme enterprise project versus in-house train models on top of um base models this is a very difficult question right I can share my general thoughts uh on this uh so when you when you talk about this right so the decision decision uh the decision to use uh which model to use. Uh a lot of it is driven by um primarily what it is driven by the um by the fact whether you're okay with sharing your data uh with a third party service or not. like some companies they would not be comfortable sharing their data with the um with a third party you know SAS application and then on the other hand right so I can also deploy this in my own uh VPC open AI and enthropic being an example open AI is a closed source model but you can still deploy it in Azure uh in your own tenant and then on the other hand you have enthropic closed source model you can deploy it in bedrock as far as I But then um someone would say no I don't want to use cloud uh I do not want to use cloud but when you go and healthcare uh case in point right so in healthcare yeah we we cannot I mean we don't want to put data in the cloud okay so what do you do then you bring it onrem well u for onrem um if I want if I decide to deploy a llama llama 3.1 model uh 400 500 100 billion parameters um the the hardware requirement is going to be 250 to $300,000 just to serve that model and then to be able to scale and all of that. So I think there are a lot of lot of variables lot of factors that uh so it's a it's a very difficult question to answer in just an information session. This is more of a perhaps uh you know um an hour two hour long conversation on on a cup of coffee right so because there is no exact answer to this uh I can go to a small language model but what if my application needs a more more advanced model right so uh uh small language models work very well for certain cases but may not work very well in other cases so I hope uh I uh I I don't know how to answer this question because there is no um no um whichever answer I give it is not going to be actually it is not going to be complete. So I can I can give you the pros and cons of all deployment models and we we talk a lot about these kind of trade-offs. I mean we have in the boot camp we have 40 hours. So we talk a lot about the these tradeoffs and how we should balance these. Okay. Um sounds uh sounds good. Uh I think there are no more questions. So we will we'll end the call. Thank you so much for uh everyone for attending. Please feel free to reach out and we are happy to help you you know in an advisor role. All right. So we'll happy to help you um figure out whether the LLM boot camp or the AentKI boot camp or uh I mean we will tell you if none of the boot camps is a good fit we'll tell you right so we we do that right so just get on a call and let us know okay thanks everyone

Original Description

Large Language Models Bootcamp Information Session Join us for an exclusive Information Session where we break down everything you need to know about our 5-day Large Language Models Bootcamp (available in-person & online). ➡ Why Attend the Info Session: ✅ Gain in-depth understanding of the structure, agenda, and hands-on curriculum. ✅ Get your questions answered in the live interactive Q/A session. ✅ Learn how our hands-on project will get you building LLM applications in just 5 days. ✅ Learn about the renowned instructors and industry-leading partners who are a part of our bootcamp faculty. ➡ Who Should Attend? AI enthusiasts, data professionals, and product leaders looking to gain hands-on experience and leverage LLMs for innovation and growth. We look forward to meeting you!

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Science Dojo · Data Science Dojo · 0 of 60

← Previous Next →

Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar

Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar

Data Science Dojo

Data Exploration and Visualization | Beginning Azure ML | Part 3

Data Exploration and Visualization | Beginning Azure ML | Part 3

Data Science Dojo

Reading External Data Sources | Beginning Azure ML | Part 2

Reading External Data Sources | Beginning Azure ML | Part 2

Data Science Dojo

Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1

Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1

Data Science Dojo

Casting Columns & Renaming Columns | Beginning Azure ML | Part 4

Casting Columns & Renaming Columns | Beginning Azure ML | Part 4

Data Science Dojo

Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5

Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5

Data Science Dojo

Feature Engineering & R Script | Beginning Azure ML | Part 6

Feature Engineering & R Script | Beginning Azure ML | Part 6

Data Science Dojo

Building Your First Model | Beginning Azure ML | Part 7

Building Your First Model | Beginning Azure ML | Part 7

Data Science Dojo

Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8

Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8

Data Science Dojo

Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9

Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9

Data Science Dojo

Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10

Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10

Data Science Dojo

Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11

Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11

Data Science Dojo

Twitter Sentiment Analysis | Natural Language Processing | Community Webinar

Twitter Sentiment Analysis | Natural Language Processing | Community Webinar

Data Science Dojo

Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar

Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar

Data Science Dojo

David Wechsler on the Impact of Data Science Bootcamp

David Wechsler on the Impact of Data Science Bootcamp

Data Science Dojo

Andrew Choi on the Impact of Data Science Bootcamp

Andrew Choi on the Impact of Data Science Bootcamp

Data Science Dojo

Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp

Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp

Data Science Dojo

Michael DAndrea on the Impact of Data Science Bootcamp

Michael DAndrea on the Impact of Data Science Bootcamp

Data Science Dojo

Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation

Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation

Data Science Dojo

Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp

Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp

Data Science Dojo

Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation

Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation

Data Science Dojo

Scale R to Big Data with Hadoop & Spark | Community Webinar

Scale R to Big Data with Hadoop & Spark | Community Webinar

Data Science Dojo

Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation

Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation

Data Science Dojo

Ryan DeMartino on the Impact of Data Science Bootcamp

Ryan DeMartino on the Impact of Data Science Bootcamp

Data Science Dojo

Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp

Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp

Data Science Dojo

Wade Wimer on the Impact of Data Science Bootcamp

Wade Wimer on the Impact of Data Science Bootcamp

Data Science Dojo

Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation

Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation

Data Science Dojo

Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation

Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation

Data Science Dojo

Lance Milner on the Impact of Data Science Bootcamp

Lance Milner on the Impact of Data Science Bootcamp

Data Science Dojo

Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp

Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp

Data Science Dojo

Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect

Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect

Data Science Dojo

Michael Atlin on the Impact of Data Science Bootcamp

Michael Atlin on the Impact of Data Science Bootcamp

Data Science Dojo

Amina Tariq's In-Person Experience at Data Science Bootcamp

Amina Tariq's In-Person Experience at Data Science Bootcamp

Data Science Dojo

Ceo's Revelation about Data Science Bootcamp

Ceo's Revelation about Data Science Bootcamp

Data Science Dojo

Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp

Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp

Data Science Dojo

Kevin Hillaker on the Impact of Data Science Bootcamp

Kevin Hillaker on the Impact of Data Science Bootcamp

Data Science Dojo

Marko Topalovic's Experience with Data Science Bootcamp

Marko Topalovic's Experience with Data Science Bootcamp

Data Science Dojo

Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar

Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar

Data Science Dojo

Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp

Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp

Data Science Dojo

Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation

Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation

Data Science Dojo

Vang Xiong on the Impact of Data Science Bootcamp

Vang Xiong on the Impact of Data Science Bootcamp

Data Science Dojo

Data Scientist's Experience at Our Data Science Bootcamp

Data Scientist's Experience at Our Data Science Bootcamp

Data Science Dojo

Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp

Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp

Data Science Dojo

Introduction To Titanic Kaggle Competition | Part 1

Introduction To Titanic Kaggle Competition | Part 1

Data Science Dojo

Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation

Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation

Data Science Dojo

Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him

Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him

Data Science Dojo

How To Do Titanic Kaggle Competition in R | Part 3.1

How To Do Titanic Kaggle Competition in R | Part 3.1

Data Science Dojo

How to do the Titanic Kaggle competition in R | Part 3.1

How to do the Titanic Kaggle competition in R | Part 3.1

Data Science Dojo

Delve Deeper into Data Science with Data Science Bootcamp

Delve Deeper into Data Science with Data Science Bootcamp

Data Science Dojo

Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp

Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp

Data Science Dojo

Shaena Montanari on the Impact of Data Science Bootcamp

Shaena Montanari on the Impact of Data Science Bootcamp

Data Science Dojo

Types of Sampling | Introduction to Data Mining | Part 12

Types of Sampling | Introduction to Data Mining | Part 12

Data Science Dojo

Sampling for Data Selection | Introduction to Data Mining | Part 11

Sampling for Data Selection | Introduction to Data Mining | Part 11

Data Science Dojo

Data Aggregation | Introduction to Data Mining | Part 10

Data Aggregation | Introduction to Data Mining | Part 10

Data Science Dojo

Data Cleaning | Introduction to Data Mining | Part 9

Data Cleaning | Introduction to Data Mining | Part 9

Data Science Dojo

Missing & Duplicated Data | Introduction to Data Mining | Part 8

Missing & Duplicated Data | Introduction to Data Mining | Part 8

Data Science Dojo

Data Noise | Introduction to Data Mining | Part 7

Data Noise | Introduction to Data Mining | Part 7

Data Science Dojo

Graph and Ordered Data | Introduction to Data Mining | Part 5

Graph and Ordered Data | Introduction to Data Mining | Part 5

Data Science Dojo

Document Data & Transaction Data | Introduction to Data Mining | Part 4

Document Data & Transaction Data | Introduction to Data Mining | Part 4

Data Science Dojo

Data Quality | Introduction to Data Mining | Part 6

Data Quality | Introduction to Data Mining | Part 6

Data Science Dojo

The Large Language Models Bootcamp provides a comprehensive introduction to LLMs, covering foundational and advanced topics, with a focus on hands-on learning and real-world applications. The bootcamp covers trade-offs and balancing in LLM deployment models. Attendees will gain practical experience with LLMs, including fine-tuning, deploying, and evaluating models.

Key Takeaways

Set up a vector database
Query a vector database
Fine-tune a model with distillation, transfer learning, and quantization
Deploy a fine-tuned model on Azure
Build a RAG application with Python
Evaluate RAG pipelines

💡 LangChain is a tool of choice for connecting to vector databases and building LLM applications, and provides context-building tools for memory and state management.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

10 ChatGPT Prompts for Job Seekers: Resumes, Interviews & Career Growth

Learn how to leverage ChatGPT for job searching, resume building, and career growth with 10 actionable prompts

Medium · ChatGPT

Lost in Transcription: The Week the Machine Started Lying

Learn how Whisper AI transcription can be flawed and understand the importance of validation in AI-generated text

How We Translate 300-Page Books Using Claude Without Hitting Token Limits

Learn how to translate long documents using Claude without hitting token limits by breaking them into overlapping chunks

Dev.to · 龚旭东

Building HITL Feedback RAG: Embeddings, Retrieval, and Reranking

Learn to build a Human-in-the-Loop (HITL) Feedback RAG system using embeddings, retrieval, and reranking to improve model performance

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)