Large Language Models Bootcamp - Information Session

Data Science Dojo · Beginner ·🧠 Large Language Models ·1y ago

Skills: LLM Foundations90%Prompt Craft80%LLM Engineering70%Fine-tuning LLMs60%Multimodal LLMs50%

Key Takeaways

The Large Language Models Bootcamp covers a comprehensive set of techniques for the generative and large language models ecosystem, including prompt engineering, embeddings, and vector databases, using tools such as Llama 3.1, Open AI, and Lang chain. The bootcamp focuses on vendor-agnostic skills and includes a project on the last day where participants build an LLM application using boilerplate code and apply learnings from the first 4 days.

Full Transcript

okay I think we are live we are going to go ahead and get started in just one moment it is 11: a.m. Pacific we'll go ahead and get started uh so this is our usual information session on the large language models food Camp um my name is Raja abbal I'm the chief data scientist at data science dojo and I'm also one of the lead instructors uh for the large language models boot camp and also some of the other training programs that we have um we are one of the longest running data science boot camps in Industry there are boot camps that uh started around we are one of the oldest and longest run running boot camps there are boot camps that started around the time then when we started uh and uh you know they don't exist anymore and uh we have been running uh the data science boot camp and other analytics and uh data and AI upscaling programs and most recently we started our large language models boot camp um where we cover a bunch of uh you know I should know a bunch of is probably a not the best way to describe it a very comprehensive uh uh set of techniques uh for the generative and larg language models ecosystem um uh um a lot of graduates perhaps more than uh um any other boot camp uh we have more than 3,000 companies on our portfolio uh this is the number of companies that we have uh served and uh we have had alumni we have alumni from pretty much any uh country or territory on the planet so um um why do you why do we actually need to go through a boot camp um for large language models application sometimes you know you get this uh uh thought hey I can just go and clone this get repository and I can build this rag application in less than 30 minutes well uh you can uh build that application uh if it is a single page PDF uh yes you can build um something very quickly uh but the reality of it is when you build the this uh these applications you have to worry about a lot more than just uh simply you know just uh um a single PDF what about there's a diversity uh of different types of files what if it is a CSV what if it's an it is an Excel what is it's a PowerPoint or a dog file or a text file and so on uh what if it's a python file um then uh what if the data is not stored on your disk what if it is in Salesforce or HubSpot or Dynamics what if the data is in Dropbox or SharePoint uh what if you have to take any actions what if your uh your llm application also has to send emails what if it has to reason um there are a lot of uh other nuances uh that may not be a u apparently uh obvious for us uh when we uh set out to build um a simple um or an Enterprise llm ation um there is evaluation how do you know that your models are performing um correctly um uh the responses are correct uh how do you know they are not hallucinating how do you know uh or how do you make sure that the the input prompts are not toxic I mean someone is not trying to trying to prompt uh you know hijack prompts uh what if uh uh you're worried about the outputs of the model um what if you are uh worried about the completeness of the response what if what if the you know the factual accuracy or consistency of response what about all of this is great but you are consuming too much token too too many tokens I mean the costs are high uh the prompts with the smallest change in prompts uh it is possible that uh you know the prompts are can be very brittle so how do you make sure that your prompts are not uh susceptible to that brittleness uh how do you make sure that you are uh staying within the context window limit right so GPT 3.5 turbo has a 16k uh context um uh length uh as opposed to GPT 40 which has 128k uh I believe llama uh 3.1 has an 8K context limit uh what does it mean what are the implications how do you prevent hallucinations how do you detect hallucination how do you mitigate hallucinations um how do you take care of regulatory challenges is uh you may be able to build something but what it gets you uh in uh trouble with authorities uh what if you're not able to reproduce the responses um yes you have built everything but the responses are very slow uh it is an interactive application and the responses are coming back very slow what if you don't want to use off-the-shelf uh you know close Source models you want to use open source model how do you self- deploy these models uh how do you worry about governance uh this was a handful we cover all of this in our boot camp and I will show you the curriculum not I mean I will not just say that we do it we will also actually go through the curriculum I will show you uh how um and how we have structured the boot camp um a bit of context how this boot camp started the boot camp we have been a training company but we also ventured into uh services and uh eventually a product that we have built and we have built an Enterprise Pro product that is currently being used by some of the big companies 800 billion doll companies they are relying on a platform that we have built um um and then basically U they are building their own um what we call co-workers and teammates uh AI assistants uh broadly speaking a agents uh we have Pro uh we have built this platform which allows uh every allows Enterprises to build um AI agents um that are that meet all of these conditions while them not needed needing to write any code um and we are we are talking about some big Enterprises and International Development and cloud computing I mean our customers actually include some very big names um so uh what are we going to cover in the boot camp before you are able to even appreciate uh the curriculum um I can give think of this as a very high level tutorial of how an llm application typically typically how an llm application is built um at the core of it uh you have uh what we call the AI the the large language models it can be in the form of your own selfhosted API or self-hosted llm uh it could be you could be hosting an open source llm in the cloud or you could be leveraging some of the existing close Source models deployed in um directly I mean examples of close Source are um gp40 any of the openi models they are examples of close Source models but you can also have models that are um open source like llama series models llama uh llama 2 llama 3 Lama 3.1 so at the core of all of these applications is the it's is the large language model but that is not it in fact uh this is probably the easiest part of uh building an llm application because in in major in almost uh almost most of the cases I mean uh 99 plus per of the cases you're not going to be building models from scratch uh it is going to be you will just go depending upon whether you're in AWS shop for Azure shop you will just go and repurpose uh or or use some of the existing models but once you have these models you need a vector database um what is a vector database well uh just like you need a SQL database or a nosql database or you need a graph database there is a database a type of a category of databases which is called Vector database and uh as the name suggests you store vectors in it what are vectors vors vectors are uh these embeddings or this uh this uh Vector representation of uh Concepts uh Vector representation of U um Vector representations of uh uh your uh your sentences your paragraphs your pages your books uh you know any any of those ideas um we push in text uh into these embedding models and uh vectors come out and vectors are pushed into the Spector database um uh in addition to that um when you're building an llm application you need a lot of U for the lack of better word uh we call it uh orchestration but I think orchestration is Loosely speaking this is orchestration but uh it is think of the this you need a framework to be able to read uh data from different files or different data sources you need to be able to uh deal with different file f formats um you want um llms are llms are stateless they don't know what was the previous query what was the next query so you have to have some sense of memory uh that you need to build in these llms uh then uh what if you want to create chains what if you want to build agentic um uh agentic behavior in your agents uh how do you want uh multiple agents to collaborate with each other so for that purpose for that reason we uh use Lang chain uh and uh and our boot camp actually covers any everything that I'm talking about uh we cover all of this in um in our uh boot camp um then there is this idea of llm caching um uh why uh so if if it is your uh your uh you have support articles you have llm assistant for support articles what if the same questions is asked by multiple by multiple people repeatedly do you want to um uh spend money on generating the response every single time because inference is expensive or you want to cash the responses how do you log um um how many queries came in what was the latency of the queries how do you broadly speaking uh what are the um uh different um what is the mechanism of logging everything broadly speaking what we call it llm Ops uh then we have uh guard rails one of the most important areas is actually um the two areas that I consider to be the hardest to implement uh one is the data governance right access controls um um you know uh one of the criticism that we have out there for some of the tools like Microsoft co-pilot is that uh when you uh uh uh you know what if you accidentally have access to some file and then if I go in query how can uh and I can get a response from um containing information that I should not have access to um so this is what we I mean think about this this is broadly speaking in that area of data and AI governance I mean when I talk about AI governance I'm talking about um uh can be can be um are the responses um toxic are are can the responses get me in trouble um uh what if the responses are polit Politically Incorrect what if the responses contain some PE uh personally identifiable information uh what how do you prevent this how do you put guard rails we call them the input guard rails how do you put guard rails on uh so preventing this from happening at the source if if there is an intent for us that you can sense that someone is going to ask a question that is uh that is not um that should not be answered how do you stop it right before it even goes to your large language model uh there's a lot more um of course I cannot uh cover every single uh aspect of it in this limited time we cover the entire thing the entire ecosystem we have in-depth discussion uh we talk about uh you know the all the components that I mentioned and I will show you uh in our learning platform uh what does it look like so the technology stack that we use uh largely uh on the um on the llm model um on uh llm side we use open Ai and Lama 2 models we use hugging face a little bit uh uh then we have in Vector databases we mainly focus on bv8 but we have exercises for others as well uh what we try to do is um you know we are teaching the skill we are not teaching a specific platform or specific vendor right so we think of the boot camp as being vendor agnostic uh we have exercises um lot of exercises we spend about Vector databases we spend about five to six hours almost like three4 of a day uh Foundation models about uh I mean it the discussion goes you know spans multiple days here and there uh then orchestration Lang chain uh we spend a we have I would like to think that we have one of the most comprehensive treatments and coverage of Lang chain uh that is out there we spent almost eight hours uh discussing um the need and uh and practical exercises on Lang chain I will show it in a moment um then we have this uh thing uh around llm Ops rise is our partner for that uh you know we we go through that uh you know the deployment aspect uh of it um and then we have run part Run Part actually um runp in Union basically uh you know how do you deploy first of all how do you fine tune a model once you fine tuned the model how do you deploy them uh in a containerized or more of a uh in a function like setting a seress setting um so the boot camp is um 40 hours 5 days 40 hours starts on Monday ends on uh Friday uh 9:00 a.m. on a Monday 5 p.m. on the during the sa uh same week on the on Friday um uh the same boot camp actually is offered uh in parallel uh it is uh or simultaneously there are some people who are sitting in uh in classroom in Seattle and then we have people attending from uh globally I mean we have had people as far as from Australia or the Middle East or Europe and Canada and so basically and Brazil um and and uh we will uh be uh so basically we have coverage across the uh across uh the globe uh all over uh we um we cover uh you know all you know it cover the content in a manner that uh you know online people they are engaged so online people are uh going to be as engaged and as involved as uh the inperson attendees um I will kogo over the curriculum uh and one thing that I forgot to mention is that we have uh a project on the last day of the boot camp uh we actually help you build an llm application from uh and we give you boiler plate code and we apply the learning uh so we did do this on the fifth day and we apply the learnings uh on during the first four days we apply those learnings into the uh into the um into what uh into the project um uh we have a a partnership with the University of New Mexico so we are uh we are partnering with an accredited institution um and a consequence of that is that uh your employer may actually uh from your central uh HR L&D professional development budget um you may be able to actually uh get uh this uh uh course uh being sponsored by your company um companies have for those of you don't may or may not know you know sometimes you have to get many people they get approvals from their managers but each company has a very dedicated allocated budget for uh that's a fixed budget for employee for every employee um and um because of our uh because we are our program is backed by an accredited university in the United States you may be able to actually attend the boot camp uh for free uh you know 100% covered in in many cases or maybe majority of you it discovered um so um check with your employer if you need uh help um let us know and this these are only some of the employers that we know that cover um that cover uh the boot camp 100% but in in in your case you may want to actually uh follow up with us follow up with your HR and let us know um uh uh for the next boot camp these are the partners that we have uh Lang chain um one of the most uh you know prominent names Lang chain and bv8 in industry new 4J um you know how do you build a graph rag we we'll I will show you I will talk about this um then arise um observability and L Maps uh again Union uh deploying the models and leveraging them uh runp pod uh they actually give us uh compute for um uh they provide us uh you know the free uh GPU clusters for our attendees so they can find tune their uh fine tune their models uh then we have security AI uh they are um in the space of uh uh data and AI governance um so uh what is needed uh for to attend the boot camp just bring your laptop uh so as long as you have a web browser enabled uh computer you have a web browser on your laptop uh that's all you need because everything all the labs all the infrastructure um we will give it to you um so if you if you need um large language model tokens uh and you uh we will pay the bills uh we will actually cover that uh up to $500 uh credit for uh all software and cloud services that are needed if you need GPU Cloud uh you gpus can be expensive uh so they are if when we do model fine tuning for other purposes we give you the GPU Cloud we have uh U more than 100 Jupiter notebooks uh and you can run them in our cloud in our cloud and our deployment um and you don't have to worry about installing uh things on your computer you know worrying about how much compute you have or do you have it permissions on your work laptop and so on so I'm going to go ahead and uh get started with the curriculum um so during uh um when we get started in the boot camp uh so before we go there right so one of the common questions is what are the prerequisites do I need to be um uh do I need to know traditional classic machine learning before I come in and learn uh generative AI well while it may be helpful that you know traditional machine learning uh but it is not a prerequisite um so it it while it may be helpful uh we take an approach where we don't assume any background in machine learning uh but um you must uh have some basic working knowledge of at least one programming language you should be able to read uh code in at least one language um and then uh um in addition to that a general interest in uh in the data and analytics space and uh ideally some problem that you're trying to solve uh you must have some problem that you actually care about um but overall uh we have had people with minimal coding background minimal machine learning background and they did just fine and you I will show you how uh that is the case so we start off with uh the uh end to end bigger picture uh very high will we explained uh why do you need a large language model what is a large language model uh what is prompt engineering what is an embedding what is a vector database uh what are guard rails what are the challenges uh when you build uh an llm medication why do you need GS uh security issues legal compliance pii regulatory issues uh we also talk about different ways of customizing a large language model uh we we talk about um uh we talk about um how do you build customize a model um You by using rag or how do you customize a model using um um with fine-tuning uh so a very high level you have the bigger picture then we start taking a deep dive uh when we start taking a deep dive uh this is where we actually start talking about you know things like uh I'm sorry not this slide Tech but this one uh so this is our learning platform uh you can see that we have uh a lot of content here and you can so this is our session on um embeddings we you can see that we are starting with very basic introduction right so what is discriminative AI what is generative Ai and Lis uh is uh Luis Sano is the one who actually teaches this session um um and you can see that we talk about you know well what does deep learning mean and uh you know very simple easy to understand uh examples so we start from there and we go all the way to uh you know we go all the way I'm sorry the pages are loading a bit slow let me give it a few more SEC here um so uh and then we go all the way to attention mechanism and Transformer architecture sorry this is taking forever to load so we talk about multi-headed attention we talk about you know um I'm sorry let me just go here yeah these pages are not loading for some reason we talk about Transformers in coded decoder architecture we talk about uh you know um you know attention mechanism we talk about uh um basically um you know to tokens how are tokens what are tokens and how tokens generated what is an embedding so by the end of this session you have a very good understanding of the the foundational aspects of large language models uh now now that you know what are embeddings and what is attention mechanism you have the all the the foundations set up uh then we go and we talk about Vector databases um well you know what embeddings are but uh how and where do you store these embeddings I'm sorry these are not loading promptly this is not funny let me apologize I will actually close my web browser and launch it all of this close all windows my apologies everyone okay we are here okay okay I will go and launch all all of this all over again where are we okay we are here for that here Vector databases we're talking about the uh introduction to Vector databases so this is here I hope this loads better yes it loaded okay so we talk about uh how does uh how does Vector database work uh the types of searches in vector vector vector databases different indexing mechanism what are the nuances of working with Vector DBS Vector DBS uh you know how do you scale a vector database the reliability and uh you know other aspects of it um you know scaling issues and everything um and then um I I will show you I mean there a long like more than 100 slides so we we take about like two and two and a half three hours to just go through the theory part of it and this is what a you know a SQL query looks like I mean this is this is a database query this is what a vector database query looks like right and then we go through this ex start explaining both intuitively and mathematically uh we start explaining all of this uh we keep going for instance you know uh we will talk about Vector searches we got keybord keyword searches semantic search keyword search Hybrid search and then have having the filtering more like SQL wear like Clauses how do you do that so uh we talk about all of this um um and you can see that uh we are talking about the hnsw indexing how how what does it mean um to do this indexing in this manner we talk about you know this notion of uh layered uh proximity graphs uh you will see that here so these are the layer proximity graphs so we actually um uh we talk about how how do we um how do we um store these embeddings uh and and for for any business for an llm application even for a even for a small business uh the size the number of embeddings can be north of uh you know tens of millions if not hundreds of millions of embeddings and when you're talking about uh um uh Google Facebook scale you may you may be looking at you know hundreds of billions of embeddings if not trillions of embeddings so how do you actually um when a new query comes in how do they actually search for these in a vector database in sub millisecond uh response times so we talk about all of those nuances uh Sebastian uh from vb8 uh uh he actually uh will be covering the session um very engaging session you will you know you'll love the theory discussion and that is not it we are uh U not just going to do the theory but we'll also do practical exercises for instance when we talked about Vector search over there you will actually practically perform Vector search here if you look at this here uh we have you know we will give you the vv8 URL we will give you the uh API Keys uh for vv8 we will give you um we will not give it to you actually you will actually and create the your own vv8 server will give you the credit uh for uh being able to create it uh uh and then we'll give you the openi key and then you are going to actually create this collection and we'll go and walk through these Labs uh step by step talking about okay what does this mean this is an embedding what does this mean um you know and in this case I'm searching for a vector uh that is close to this Vector uh and so on once that is done we go and we talk about similarity search we go to hybrid search and I will not click on every single lab of course but if you look at this the same drill in this case what if you need both uh a keyword search and hybrid search in this case you can see um you know this is showing you how to uh how to actually run these um how to run these uh uh uh how to run these queries how to run these uh commands um in this case you can see that we are um we are uh up we are loading some data into a collection think of a collection as a table uh the SQL equivalent video table and we run run run we run this uh from here we run it again we keep running in this case you can see that I'm doing a hybrid query and we go step by step explain things we are not uh you know uh so this uh these five or six labs for Vector databases they take another 2 and a half to three hours we go through every possible detail here we try to you know change uh the query change this hybrid Nest uh parameter how you know do we want it to be more semantic or we want it be more keyword based all of this we cover in depth and we see okay so when we change this uh hybrid to a different value uh the results May get altered and we give you that intuitive understanding of what happens uh under the hood when you're using a vector database in this case we are filtering right so it's a combination of search and filtering um and so on um then we have this example of generative search uh you know uh multi- tendency Vector compression when you have too many vectors um uh the search becomes very very expensive so compressing the vectors can actually U make uh it less expensive so we talk about those aspects um and semantic caching uh I think I mentioned um sometimes you store your queries and the responses uh to be able to to be able to uh uh to be able to reduce the cost and reduce latencies and responses um um we also uh have a session um Adam CI from NE forj he covers uh the idea of graph rag so we talk about knowledge graphs and how they can be used in the context of in the context of uh Vector databases and rag uh and uh and then we have Labs uh for that as well um then we go ahead uh after this uh then we after uh after we have done all of this then we start talking about how do you build um how do you build um uh uh a retrieval augmented generation application rag application commonly known as rag we go through this and once again we are um uh we use Lang chain for this and uh we go through Lang chain um and then we talk about uh you know what is the need uh for having Lang chain why do you need prompt templates uh and for those of you I'm assuming most of uh most of the audience um actually all already knows how to use uh and they have used chat GPT but uh you know what if you need to your prompts to be variable right so you know maybe you're building a ux or UI where the customer enters some location and then you want uh it to be dynamically uh plugged into a prompt so this is a prompt template um then we talk about example selectors how do you use a few short examples so you can see that uh we have completely streamlined the process really really streamlined the process we are not actually you know you don't have to install anything on your computer we have and we take care of all the compute and you know uh make keeping these libraries updated so you can focus on only one thing understanding how llm applications are built and we take care of all the we do all the heavy lifting on your behalf um so uh in this case uh you know I can keep going uh you know we talk about retrieval uh how do you connect it to different data sources how do you connect your application to different data sources how do you uh how do you chunk your documents how do you um uh how do you um you know different kinds of uh chunking techniques um then uh how do you build chains uh how do you build um um llms are stateless how do you bring in the concept of state or memory uh in these models how do you build agentic uh agentic te um applications um so we start with the search agent if you look at this we start with a search agent uh and you can see that we are using this uh we installing this du Duo search and we have this uh Lang chain. tools. ddg search. tool do. go search tool we go step by step uh and then we actually first of all of course um we rationalize why why do you need uh why do you need a um an um a search agent or for that matter any agent what does it mean to have an an agent um and then uh we go through um increasing complex examples until we actually talk about this um you know this uh commonly used and sometimes abused word of agentic uh Frameworks so we actually uh we cut through the chase uh and hype and we actually talk about you know this multi-agent framework uh where you know you have these dependency graphs and all of that fairly technical but the nice thing is even if you are not a coder you do not have to write any code from scratch you may be able to modify it but if you even if you if you can read code if you can understand what may be going on if you're a technical product manager you're a you're a people's manager you're a Dev manager who does not actively write code um sometimes I mean that that is see that is the case you for a Founder um it will still give you a very good Deep dive into uh how these things are done uh one of the things is when you if you're a technical product manager if you want to be able to uh if you want to be able to build um uh um or guide uh or manage projects uh where you have some technical team that is building these applications you must understand the limitations you must understand um you know why worry about tokens and why caching is important and why you need to have a multi-agent system and why do you need to worry about AI governance and all of that and we cover all of that in this boot camp in a very very practical and Hands-On manner um okay let me see what is it uh so we also talk about fine tuning and in fine tuning uh I I will uh Pace up here a little bit in fine tuning uh you are going to uh we will be actually talking about uh you know fine-tuning um llama uh we'll be talking about a fine tuning a llama uh Lama 2 model Lama 2 7 billion parameter 4bit quantized model uh we'll first go through what is fine-tuning what is transfer learning what is quantization what is low rank adaptation uh you know uh really understand understanding the theoretical aspect of it and then we go and start fine-tuning the model um in um uh and we give you a GPU cluster we give you a sample code we download the model from hugging face um find unit compare the responses of uh model without fine tuning and with fine tuning and really form our opinion on when is fine tuning is uh when is fine tuning a good idea uh we have of course a session on promp engineering um then we have a session on um uh large language models Ops how do you deploy these models how do you put guard rails how do you uh observe your model uh then we have uh uh the session uh a detailed session on evaluation evaluating uh is actually hard uh and we talk about all the aspects of evaluation let me go here evaluating llms and uh um so we talk about um you know the traditional NLP type uh uh language translation type uh evaluation approaches and all the way to uh you know retrieval augmented generation uh evaluation approaches so you can see that it is quite in depth okay and then we are talking about uh you know um you can see that we are talking about uh you know different uh evaluation uh approaches and then in brag um there is some additional uh some additional uh evaluation steps that you need to take uh if you look at this uh evaluating model performance uh there are different metrics uh faithfulness answer relevancy context recall context Precision uh we will go through all of this explain line by line and discuss um what else uh on the last day of the training uh we will work on a project and uh with that project we are going to um we are going to um you know uh Implement uh or we'll give you boiler plate code and we'll give you some exercises so boilerplate code for a streamlet app and then we'll give you some exercises that can actually go uh and you will go ahead and add more um more will ask you to add a chain or add some memory and implement this agent and that agent so it's a project that actually will be bring Ing and putting everything uh that everything that we have taught uh during the boot camp we will be putting it together okay so where are we here let me go back I'm going to very quickly go over I ended up closing this so and meanwhile if there are any questions I am happy to answer uh any question that you may have about the logistics the curriculum I'm more than happy to answer these questions okay so these are some of the speakers uh that have spoken in the past or uh will be speaking uh this time I know for a fact that of course I'm one of the instructors Luis is confirmed Adam is confirmed John is confirmed so is confirmed uh Sage I'll have to check with the team but you know depending upon availability these are not just any people anyone uh you know just uh presenting uh these are actually you know seasoned Professionals in uh llm and AI space um so um you know they depending upon their availability uh we will be actually uh having them uh this is our partners so our partners actually I think we have put together a very good program I these people are practitioners as you can see that uh you know they love the program and these are our customers our customers actually have come through come from a large number of companies uh you know uh uh you know even for the llm boot camp I mean our uh I mean we have more than 3,000 uh companies on our portfolio but uh even for the large language models I think we have crossed perhaps 70 80 different companies we have we have had people from a large number of companies and this is just a sample check out our website uh if you need to know who has attended go you know these are real people these are not stock photos go and talk to them ask them about I mean what they think about our uh boot camp uh the next boot camp is happening simultaneously in Seattle and online so the February uh 3rd to 7th um uh it is um I think less than two weeks from now uh if you cannot travel for um you know scheduling reasons or for Budget reasons the same boot camp you can attend it online uh you will see the people in class the people in class can see you uh you hear their questions they hear your questions and you can interact with the uh you can interact with the with the instructors and I think that would be it uh from my end and I will see if there are any questions let me see where is the Q&A okay so like Raja like do you cover uh uh performance evaluation of uh uh Foundation models we do actually in a lot of detail um I cannot see your name but uh I like actually it's better to address you by your name but call me sh yeah okay uh so Shar if I heard correctly okay so yes we do uh and if you look at this I can show you the actual uh content around evaluation uh uh we are so I'm not just an instructor um I'm also um you know not just an instructor but also um um an engineer uh um and an architect for a product that we we have built but if you look at this here now we um uh if you look at this just to give you an idea I mean so this is not uh there's no textbook for these right now um um this is uh you know a um this is a distillation of everything that we have done in Real uh in while building our product so if you look at this here um now you have uh you know we talk about well why do you need what is the rationel for llm evaluation right and we talk about this and now once we have what are the typical problems right so and very systematically right because you know when you have lived it uh and we have when you have done it in detail um in the way you actually look at things it is going to be very different from uh you know just doing a theoretically right so and then we talk about well you need data sets and metrics and we go through these data sets and met metrics and of course the tasks right so we talk about you know there's an language understanding task it could be a generation task so you can see we talk about all the types of tasks that you can potentially uh evaluate uh then um uh and uh you will have those tasks performed on uh some reference data sets you know some data sets are only for reasoning some are for natural language understanding so we go through all of this um then we uh you know and different data sets why do they exist and why some are when to use which ones so we go through that uh and then we talk about uh you can see that uh um you know how do you actually set up a uh evaluation data set in your own uh scenario and then we go to evaluation metrics and then we talk about you know what is blue score what is Rouge score uh and uh how do you use ragus right so and then we go through all of this and once you have understood all of this of course cannot go uh through everything then we go here and then we talk about you see that there are corresponding labs to the slides that you saw so um I actually am actually I should I I can say easily that I'm proud of what we have built uh because you know once you leave this uh boot camp uh I'm very very confident that you have a very good understanding and depending upon your past background whatever you have done in the past uh you would actually if you're a Dev you should be able to actually build things from scratch so like any specific tools you are using for evaluation uh we are using using as you can see that there is a package called ragus so we have Labs with ragus but we also have Labs with G well um you know when one technically you know uh your another model actually evaluates the response from another model and then so we are using a combination of different libraries you know um is does it answer your question have you uh looked into FM bench that uh AWS has released I mean uh so sh here's the here's the thing right so uh as as I said earlier in the in the training we focus on the skill part of it and not the tools part of it while we did did select some tools but uh our hope is that once uh because there's so much uh there's so much that can be done we barely have I mean it is only 40 hours right so look looks like that 40 hours is a lot but believe me it we struggle to finish all of this content uh and that's why we have optimized everything so what we do is we cover every aspect of it using one tool um and ragas is pretty much the it's fairly popular right now uh if not the most popular um so we cover it and then once you have understood internalize this you know it's almost like if I'm a good Java programmer I can pick up C sh right so if I'm a good C programmer I can pick up python right so just one uh um you know one framework or one library and the problem that it solves um so and then after that you can be on your own um because even if we covered one more Library um you know there will still be uh a dozens that uh we did not cover I hope that makes sense yeah yeah that's good and the other one is in the agent I think this year 2025 they call it as agentic AI year yeah yeah yeah yeah yeah I mean you know I mean the people who are just sitting there on social media I mean so keyboard Warriors I mean so they can name names uh you know uh uh but yes I mean there's some truth to it but you know this is more of you know that uh you know creating some engaging and clickbaity um uh coining uh these terms but in reality I mean even if you when you're building this agentic stuff uh you know there is a lot more that goes in right so you know you cannot build agents without a proper evaluation you cannot uh build good agents without proper AI governance and data governance and proper without proper guard rails you cannot build good these good agents without uh understanding of Lang chain so uh and you know how do you connect it to different data sources and you know being able to deploy the models properly and deploy the web app properly and so on so yes I mean while there is a lot of hype around it we do cover that but when they say that it so um so you uh you cannot build an agent without having an understand so you know that you cannot build something agentic in nature without understanding all of this if you do not understand embeddings well you cannot build a good agent and so on yeah so my question with respect to agent I think this uh single agent or multiple multi-agents okay the interesting thing that is happening I think especially when people are working in that space debugging is becoming a important skill so are you going to cover how to debug issues when a single agent or multi multi-agents are being used in the work uh to the best of our knowledge but you you understand right so you know um a coach can tell you uh how to uh how to deal with certain things but when you are in that game uh you know uh there can be General guidelines but there are certain things that um you can only learn by you know being in that High um high stakes game I'm giving you very honest answer right I'm uh you know I'm a straight shooter I'm not going to say we will teach you everything of course you know because there are infinitely many possibilities that you have to bug uh I mean and I can honestly tell you I mean so I scratch my head on the product side of it I mean there are challenges that we run into there are a lot of architectural challenges uh so we share as much as possible but there are brick walls uh that you will hit and you will have to face them yourself we can we can uh I mean um so as I said I mean this is a very honest answer we'll share some of the interesting brick walls that we hit um and then we'll talk about that uh we have built a product as I said uh you know we have built guard rails around it we uh we have built you know governance Frameworks around it um then we have talked about you know agentic you know react workflows and all of that how do you deploy them in uh in in the cloud uh such that they remain scalable um you know whether you deploy it in a VM or a container or uh in a serverless fashion all those things uh you know uh we will discuss uh but uh the good thing is that the people who attend the boot camp you can already see this boot camp is uh going to be attended by people who are very serious about this right this is not any fluff uh presentation this is some hardcore real deal so the people who come in usually um they are already um many of them have already uh you know they are um they have already many of them have already attempted building things so the quality of the uh the questions actually it is also a learning experience for me as well right so so a lot of learning actually comes from them as well but you know debugging yes we'll talk about it um but can we cover every possible scenario I do not think so y yeah okay I think I missed initial part of the presentation so I just want to understand like uh see like I think you were mentioning like okay there are like uh uh I think like free like not free like I think paid ones uh like Udi or like uh uh This Cloud the corsera kind of courses so how do you differentiate that with uh the one that you are offering they also have similar kind of courses yeah yeah yeah yeah yeah I think um um the the difference is uh I mean some of them with all due respect I mean so you know uh credit is Du where you know we have to give it to them I think uh there are some courses that are good um out there but uh you know there is a difference between uh you know the courses uh the end to-end courses because the courses that you will find there there's a course course on prompt engineering there is a two-hour course on Lang chain then there is possibly a course in prompt engineering and now you have to go and put the pieces together hey should I go there do this first or do that first um then all the infrastructure you know everything is ready for you right so you all you have to do is just hit the ground running you you we are we have um um we have uh you know streamlined everything to you for you because a lot of time uh many people do not know um uh where what should I learn first right so really it is it is uh it is U maybe uh I would uh I would call it it is choreographed in a manner uh orchestrated in a manner end to end that you are set up for Success right so you know and it covers every aspect of it right so you may have a very disconnected uh course on prompt engineering well what what do I do well I know what prompt engineering is how do I build an application so this course starts you from uh scratch and takes you all the way to building an application while not missing any aspect of it and also um uh you know I'm I'm one of the lead instructors I have uh created this curriculum but at the uh I I mean maybe I'm an educator first but I'm also an engineer I've been doing machine learning for a long time before before it was mainstream so you will be learning from uh people and other instructors of course uh amazing people you will be learning from people who have actually done it it is not that they have read this paper or attended that YouTube talk and then they are just presenting a slide deck so when I tell you um that uh uh you know ragus uh I'm not talking about just some that uh well you know I'm not presenting it I have actually used ragas in practice so and that is the differentiator right so it is basically uh it is offered by practitioners for to turn the Learners into practitioners so and it is very comprehensive right so it is not just a 1 two hour 3 Hour 4 Hour course it is a very comprehensive course that covers things end to end okay that's good I think like based on your experience I'm assuming you will be covering like in Bas uh like in which scenarios like which uh Foundation models should be used which ones should be not used yeah yeah yeah uh that's a tough one right so because if I asked you uh Shar uh what car should I buy should I buy a Tesla or a minivan or uh or an SUV or uh something else what would be your answer yeah depending upon the use case depending on the use case right so yeah I mean so so yes I mean we'll talk about it generally speaking what are the limitations Etc but as I said right so you know you there are certain things uh that you do in a textbook setting you know uh I mean I can give you this example right so in the past I've worked with people um who would um you know great statisticians and now they are uh a great machine learning background but they are hung up on this hey we cannot apply this test on this data because it is not normally distributed um so you know we'll give you all the science part of it but we'll also teach you the bias for action of an engineer right so there is you know scientists they want everything to be perfect you know assumptions and all of that engineer well it works let's let's move on right so so I think that is is the mindset that we would uh we are going to inculcate and and by the way if you are interested I mean uh feel free to I mean go to our boot camp website I think there should be some link somewhere the people who have attended the Boot Camp or reach out to them on LinkedIn I mean they they we have I mean we have dozens of people I mean overall we have hundreds of people on uh you know real people who are on our website go reach out to anyone and then ask them what do they think about the boot camp right so you know we are that confident right so real people are there who are our Champions so like can I set up a meeting with you oh yeah yeah so I think I see that the team has been uh the team has been actually uh um posting links so I mean if you go to the webinar chat or uh uh you can actually go and uh uh ask uh you set up a time um and I I think there is another question by George is there a sequel of the basic boot camp the next level George I'm I'm assuming you're referring to um referring to this boot camp as a basic boot camp uh no nothing yet because this is already uh I don't know if I maybe I did not do a good job this is actually pretty intense and this is already an advanced boot camp uh maybe I did not do a good job because we go into the details we go deep in the beads so uh so and we don't have an advanced boot camp uh Beyond this at the moment uh I don't know what would that what would that boot camp be at this point um so at this point we do not have uh any other advanced boot camp for uh as a sequel for this okay thank you George uh and George feel free to reach out I mean um you know happy to talk to you one-onone if you can set up a time time uh you know I'll uh you know 30 minutes happy to talk to you and answer any questions maybe maybe there is something that we can include here maybe there is uh maybe it's already there I would love to uh hear your thoughts if there is anything uh that we can help with okay um are there any other questions let me see if there are no other questions uh uh please please feel free to reach out to us uh you know uh we're not I mean our sales team we are not pushy we don't we w't harass you and all of that right so we are we take a very different approach to sales uh if you don't need it we'll let you know I mean this is not the boot camp you're looking for right so and sometimes people will have been shocked I mean are you telling us I mean you don't want us as a customer yes I mean that's how we do business right so feel free to reach out I mean in any case I mean if we can help you uh would love to CH okay uh thanks everyone who attended uh looking forward to seeing some of you at the boot camp

Original Description

🚀 Transform your data strategies with our upcoming Large Language Models Bootcamp! Join us for an engaging information session where we unveil the exciting details of our upcoming 5-day bootcamp (both in-person & online). ➡ What to expect during the information session: • Overview of the bootcamp structure and agenda. • In-depth exploration of the core topics covered. • Insight into hands-on projects and real-world applications. • Meet the expert trainers and learn about their experiences. ➡ Who should attend? Whether you're an AI enthusiast, a tech professional, a creative thinker, or simply someone eager to explore the possibilities of large language models, this event is tailored for you. We look forward to meeting you!

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data Science Dojo · Data Science Dojo · 0 of 60

← Previous Next →

Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar

Feature Engineering and Predictive Modeling | Data Analytics with R and Azure ML | Community Webinar

Data Science Dojo

Data Exploration and Visualization | Beginning Azure ML | Part 3

Data Exploration and Visualization | Beginning Azure ML | Part 3

Data Science Dojo

Reading External Data Sources | Beginning Azure ML | Part 2

Reading External Data Sources | Beginning Azure ML | Part 2

Data Science Dojo

Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1

Importing Data, Accessing, & Creating a New Experiment | Beginning Azure ML | Part 1

Data Science Dojo

Casting Columns & Renaming Columns | Beginning Azure ML | Part 4

Casting Columns & Renaming Columns | Beginning Azure ML | Part 4

Data Science Dojo

Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5

Scrub Missing Values & Project Columns | Beginning Azure ML | Part 5

Data Science Dojo

Feature Engineering & R Script | Beginning Azure ML | Part 6

Feature Engineering & R Script | Beginning Azure ML | Part 6

Data Science Dojo

Building Your First Model | Beginning Azure ML | Part 7

Building Your First Model | Beginning Azure ML | Part 7

Data Science Dojo

Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8

Run and Fine-Tune Multiple Models | Beginning Azure ML | Part 8

Data Science Dojo

Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9

Deploying Your First Predictive Model As a Web Service | Beginning Azure ML | Part 9

Data Science Dojo

Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10

Using R API to Obtain Predictions From Your Web Service Beginning Azure ML | Part 10

Data Science Dojo

Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11

Using Python API to Obtain Predictions From Your Web Service | Beginning Azure ML | Part 11

Data Science Dojo

Twitter Sentiment Analysis | Natural Language Processing | Community Webinar

Twitter Sentiment Analysis | Natural Language Processing | Community Webinar

Data Science Dojo

Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar

Listening to the Melody of the Universe (LIGO Gravitational Waves Presentation) | Community Webinar

Data Science Dojo

David Wechsler on the Impact of Data Science Bootcamp

David Wechsler on the Impact of Data Science Bootcamp

Data Science Dojo

Andrew Choi on the Impact of Data Science Bootcamp

Andrew Choi on the Impact of Data Science Bootcamp

Data Science Dojo

Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp

Microsoft's Software Engineer Shares Her Experience with Data Science Bootcamp

Data Science Dojo

Michael DAndrea on the Impact of Data Science Bootcamp

Michael DAndrea on the Impact of Data Science Bootcamp

Data Science Dojo

Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation

Data Driven Decision-Making with Data Science Bootcamp: Artem Kopelev's Revelation

Data Science Dojo

Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp

Learn the Fundamentals of Data Science: Srinivas Rao's Experience with Data Science Bootcamp

Data Science Dojo

Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation

Re-Learning Data Science with Data Science Bootcamp: Analyst's Revelation

Data Science Dojo

Scale R to Big Data with Hadoop & Spark | Community Webinar

Scale R to Big Data with Hadoop & Spark | Community Webinar

Data Science Dojo

Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation

Enhancing Skills with Data Science Bootcamp: Sharon Lane-Getaz's Revelation

Data Science Dojo

Ryan DeMartino on the Impact of Data Science Bootcamp

Ryan DeMartino on the Impact of Data Science Bootcamp

Data Science Dojo

Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp

Software Engineer at Microsoft Reveals About His Experience with Data Science Bootcamp

Data Science Dojo

Wade Wimer on the Impact of Data Science Bootcamp

Wade Wimer on the Impact of Data Science Bootcamp

Data Science Dojo

Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation

Analyzing Data with Data Science Bootcamp: Hannah Richta's Revelation

Data Science Dojo

Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation

Applying Data Science Skills to The Current Role with Bootcamp: Marcos Lacayo's Revelation

Data Science Dojo

Lance Milner on the Impact of Data Science Bootcamp

Lance Milner on the Impact of Data Science Bootcamp

Data Science Dojo

Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp

Deloitte's Data Scientist Revelation: Learning Predictive Analytics with Data Science Bootcamp

Data Science Dojo

Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect

Rajesh Patil's Experience at Data Science Bootcamp As an Enterprise Architect

Data Science Dojo

Michael Atlin on the Impact of Data Science Bootcamp

Michael Atlin on the Impact of Data Science Bootcamp

Data Science Dojo

Amina Tariq's In-Person Experience at Data Science Bootcamp

Amina Tariq's In-Person Experience at Data Science Bootcamp

Data Science Dojo

Ceo's Revelation about Data Science Bootcamp

Ceo's Revelation about Data Science Bootcamp

Data Science Dojo

Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp

Stephen Miller Describes His Experience at Data Science Dojo's Bootcamp

Data Science Dojo

Kevin Hillaker on the Impact of Data Science Bootcamp

Kevin Hillaker on the Impact of Data Science Bootcamp

Data Science Dojo

Marko Topalovic's Experience with Data Science Bootcamp

Marko Topalovic's Experience with Data Science Bootcamp

Data Science Dojo

Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar

Text Analytics With Python, Cognitive Services & PowerBI | Data Analytics | Community Webinar

Data Science Dojo

Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp

Unisys Manager's Revelation: Visualizing Real Time Data with Data Science Bootcamp

Data Science Dojo

Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation

Learn Data Mining with Data Science Bootcamp: Ryan LaBrie's Revelation

Data Science Dojo

Vang Xiong on the Impact of Data Science Bootcamp

Vang Xiong on the Impact of Data Science Bootcamp

Data Science Dojo

Data Scientist's Experience at Our Data Science Bootcamp

Data Scientist's Experience at Our Data Science Bootcamp

Data Science Dojo

Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp

Alejandro Wolf Yadlin on the Impact of Data Science Bootcamp

Data Science Dojo

Introduction To Titanic Kaggle Competition | Part 1

Introduction To Titanic Kaggle Competition | Part 1

Data Science Dojo

Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation

Learning How to Code in R with Data Science Bootcamp: Priscilla Mannuel's Revelation

Data Science Dojo

Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him

Andrew Berman On Why Data Science Bootcamp Is Better Fit for Him

Data Science Dojo

How To Do Titanic Kaggle Competition in R | Part 3.1

How To Do Titanic Kaggle Competition in R | Part 3.1

Data Science Dojo

How to do the Titanic Kaggle competition in R | Part 3.1

How to do the Titanic Kaggle competition in R | Part 3.1

Data Science Dojo

Delve Deeper into Data Science with Data Science Bootcamp

Delve Deeper into Data Science with Data Science Bootcamp

Data Science Dojo

Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp

Bank of America Data Scientist Reveals His Experience of Data Science Bootcamp

Data Science Dojo

Shaena Montanari on the Impact of Data Science Bootcamp

Shaena Montanari on the Impact of Data Science Bootcamp

Data Science Dojo

Types of Sampling | Introduction to Data Mining | Part 12

Types of Sampling | Introduction to Data Mining | Part 12

Data Science Dojo

Sampling for Data Selection | Introduction to Data Mining | Part 11

Sampling for Data Selection | Introduction to Data Mining | Part 11

Data Science Dojo

Data Aggregation | Introduction to Data Mining | Part 10

Data Aggregation | Introduction to Data Mining | Part 10

Data Science Dojo

Data Cleaning | Introduction to Data Mining | Part 9

Data Cleaning | Introduction to Data Mining | Part 9

Data Science Dojo

Missing & Duplicated Data | Introduction to Data Mining | Part 8

Missing & Duplicated Data | Introduction to Data Mining | Part 8

Data Science Dojo

Data Noise | Introduction to Data Mining | Part 7

Data Noise | Introduction to Data Mining | Part 7

Data Science Dojo

Graph and Ordered Data | Introduction to Data Mining | Part 5

Graph and Ordered Data | Introduction to Data Mining | Part 5

Data Science Dojo

Document Data & Transaction Data | Introduction to Data Mining | Part 4

Document Data & Transaction Data | Introduction to Data Mining | Part 4

Data Science Dojo

Data Quality | Introduction to Data Mining | Part 6

Data Quality | Introduction to Data Mining | Part 6

Data Science Dojo

The Large Language Models Bootcamp is a comprehensive course that covers the fundamentals of LLMs, including prompt engineering, embeddings, and vector databases. The course focuses on vendor-agnostic skills and includes a project on the last day where participants build an LLM application using boilerplate code and apply learnings from the first 4 days. The course is designed for beginners and covers the basics of LLMs, including fine-tuning, quantization, and low rank adaptation.

Key Takeaways

Learn the basics of LLMs
Build an LLM application using boilerplate code
Fine-tune an LLM model
Evaluate LLM performance
Use prompt engineering to optimize LLM performance
Implement a retrieval augmented generation system
Use vector databases to store and search embeddings

💡 The Large Language Models Bootcamp provides a comprehensive introduction to LLMs, including prompt engineering, embeddings, and vector databases, and focuses on vendor-agnostic skills to prepare participants for real-world applications.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Your LLM Doesn’t Pick Stocks — It Remembers Them

Discover how LLMs remember stock picks rather than making actual predictions, and why this matters for AI-driven investment strategies

Medium · Machine Learning

Word Representation

Learn how word representation works in NLP and its importance in understanding human language, enabling applications like text classification and language translation

When Cosine Similarity Approaching Singularity in Google Search AI Mode

Learn how cosine similarity approaching singularity affects Google Search AI and unified knowledge graphs, and why it matters for AI engineers and data scientists

When Cosine Similarity Approaching Singularity in Google Search AI Mode

Learn how cosine similarity approaching singularity affects Google Search AI and unified knowledge graphs, and why it matters for data science and AI development

Medium · Data Science

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)