Customizing Large Language Models GPT3 for Real-life Use Cases #gpt3 #datascience

Analytics Vidhya · Beginner ·🧠 Large Language Models ·3y ago

Skills: LLM Foundations90%LLM Engineering80%Fine-tuning LLMs70%Prompt Craft60%

Key Takeaways

The video covers customizing Large Language Models GPT3 for real-life use cases, including its architecture, applications, and fine-tuning, using tools like OpenAI, TensorFlow, and Hugging Face.

Full Transcript

yeah so guys my name is Siddharth I am I am in the analytics Vidya as a in a data science team so I'm gonna give you a brief introduction to the data sessions so basically the data session is a series of webinars conducted by analytics Vidya and led by top industry experts and it is a fun way to understand the concepts of data science from the leading players in the data Tech domain so as the name suggests it's one are dedicated to data so we are hopeful that these sessions are going to be great source of enrichment and value adding for community members so now coming to our session today which is customizing large language models gpt3 for real life use cases so in this data anushta will cover all about llm starting from defining fundamental definitions of llm and highlighting important mathematical understanding to explain real life examples so he will also explain the real life Project based on customizing gpt3 learning pack for llm models and finally how you can kick start your career in AI so before we kick things off I and I hand it over to a speaker a quick recap of the housekeeping items yeah so we are recording the session and we'll make the recording available in a few days on our YouTube channel and please use the Q a section for asking any question you might have during the session and we will do our best to answer them as the data progresses or towards the end also we will share a poll about feedback of the session towards the end of the session which I request you all to kindly fill up so now coming to our speaker in this session of datar we have anus so guys are presently yeah connection okay no problem yeah so where is uh is presently working as an AI engineer at Newton School building the next attack Revolution with power of AI so former to this he was an internet Samsung Google tensorflow wry IIT Patna and IIT Bombay he has 10 plus years of experience of working with companies of India and Silicon Valley to develop AI Solutions he has also worked with mhrd India he is a presidential award winner for AI contributions toward a national wedding so now over to you and stop the virtual stage is all yours uh yeah thank you so much uh thank you so much guys for the warm introduction and uh hope everyone can hear me out properly and yeah so okay thank you so much so okay so let's get kick started with today's session uh so today's session is all about the the most biggest trained in the industry right now it's all about jpt3 and obviously SNM comes up with jbd3 is something like as an open AI what kind of revolutions they're they're doing and uh how does gbt3 actually works on the back end with the from the mathematical point of understanding as well as the point of understanding of artificial intelligence and why it is called as the next Revolution means like why everyone is telling the GPT 3 is going to revolutionarize the world it's going to change the world and even some kind of the faithful stops as well so today we'll discuss on all these things and also I will be showing you guys some kind of the Practical applications that I am doing with gptv right now and some of the experience that other peeps are doing in the world as well I will be also speaking how you can customize your GPS models or any gp3 application or any kind of architecture that is related to gpt3 for your own use cases okay so without wasting any time uh yeah sure sure so without wasting any time uh just let me share your screen just give me a confirmation once you can see it okay so I guess you guys can see my screen just kind of give a quick confirmation on the chat okay cool I guess my screen is visible to everyone cool so okay so okay so without wasting any time so let us get started with the GPD 3 and what does this gpt3 is all about not visible okay just just a second I'm really sharing it out uh I guess better now no actually there isn't uh it is sharing screen and the black screen is coming over there take the just a second yeah let's say some sort of glitch okay on screen uh can everyone see it now yeah it's visible yeah yeah okay okay so there is some sort of glitch on the zoom side right cool so let us get started yeah so let us get started uh with the concepts of the cpd3 and what does this term actually means all about right so when we speak about jpt3 so basically it is nothing but a Transformer with a huge scale so now I will not tell about a lot of big terms or big Concepts so that it can easily everyone can get easy context of so what does this happens is wherever we train out an AI engine always gets Trend with some sort of data right and it's a general hypothesis the more amount of the data you feed to a model and more good amount of the processing you do with the with that type of a data you will be getting a greater better features to get kick started with a better algorithm results right so it's all the game about data and the Logics and the feature engine engineering we used to do on the end of the model side so basically gppt3 also comes up with this kind of concept so it is not the first model that disputes are introduced I will be also telling who introduced it and how the what is the history behind it so there are various variations and various sort of experimentations that has been carried out over five to six years from now right so basically the start was very simple with the Simple Start was when the engineers under researchers were getting stocked up with the results of the neural networks I mean it's like every neural networks have some kind of a threshold Factor when you speak about a neural network yes we can solve multiple Problems by using a power of General network from classification to some sort of predictive analysis to some sort of solving or detection problem neural networks played a very good well right but there is always a biasedness factors with the neural networks because whenever AI model predicts the AI model predicts with some sort of probability at the end of the data prediction turns into some sort of probability the AI gets a level of confidence when it is predicting something and when the confidence is handsome and some enough it gets predicted out okay I can predict it in this way okay so what is the major and major problem with the neural network game was any kind of neural network architectures I am not I'm not specifying with either a convolutional architecture or residual architecture I'm just telling the family to be a family of the neural networks because jpt3 is also a type but it's called as a modified type of it right so what happens within General neural networks is there is a high factor of biasedness means like when there are complex features means like uh suppose think like you have to do a very finite level of analysis in between two objects means like there is a square of somewhat bigger shape and there is another square with having a shape suppose 2x around is One X smaller than that of the bigger one right understand they have same colored same dimensions or might be same orientations their placements everything is same and now you are telling the neural network can you tell me uh is this 2 square or is there like is there two difference or is there is a single square right so here your neural network will be getting stuck up because their terms of biasiness factor right so what does this biasedness factors is when the two distinguishable features or the discriminative features doesn't have a lot of difference in between themselves the neural network gets stacked up and the vastness factors get increased okay so this is one of the factor that the researchers had been pointed out and this led to a bar of the big models or the larger models so here comes a hypothesis against this hypothesis is very famous in our world of artificial intelligence that is called moravex Paradox or someone called or there are several other paradoxes so what does this moravics Paradox speak is a AI model can perform highly well if if you are trying to solve a complex problem and while you come to an easier problem it gets sucked off right so basically the researchers told it if I make the neural network so robust suppose think a scenario your neural network have a data of all around the world right it's like whatever data is present over the Internet obviously your neural network have that data and your neural network is actually just like a human brain it have a very good capability of understanding every end of a single feature right then what is the case on that particular thing then obviously your neural network is the best neural network means like it can have the human-like power to analyze everything okay so I guess there is some kind of issues with my voice uh okay some people have yours like it is okay okay so I guess it might be breaking for someone and while you're breaking news there's some problem yeah you can continue okay okay cool thank you so much so okay so so okay so let's get back to the discussion okay so this was the very basic case of the how the researchers were trying to come out of this new neural network architecture so on that time we had already played the game of this kind of attention and uh Transformers architecture so on a very basic scale when you are having a bunch of multiple features to get processed or get engineered in the case of a neural network if every each and every single features needs a kind of a special attention or special type of engineering so that you can have a discriminative value in between themselves okay so that is a complete different discussion of about how the birth of Transformers and attentions takes place so right now I am not going into that kind of discussion so let us understand the most advanced versions of this architectures were there in the field so what is researchers at open AI did at the open AI Labs is they first tried to they first tried to make the neural network so robust by playing with a larger amount of our data so obviously if you speak about a large amount of the data then that doesn't mean say it's all about collecting the data it also includes annotations okay so they had specified their every simple use cases so this was the birth of the large language models so this lens is all about they have a huge database of each and every text queries or any kind of a voice based queries or any kind of the Google search queries that General peeps used to do and obviously it is an openly available data set there is no kind of the breaches no kind of the copyrights to collect that data and everything so they have you might think about a scenario this kind of models have that much sort of data that is actually present in an internet right so now comes the twist okay I have that much of data then what to do with this data right so they had actually manually annotated all the kind of this data for specific use cases if you say about classification yes then had annotated if you say about of detection since they have annotated all kind of the text data now I will be telling you about the games of this of this gpd3 architecture so many of you who had already used open AI codecs or who had already used with this epd3 you guys might say I might say Hey you are telling obviously in the open AI playground there are some mentioned use cases might be code completion might be code translation or might be transforming code from one programming language to another but if I give an unknown use case to that language model that language model or gpd3 can equally perform well now you are telling the data is annotated then how can it perform in that unseen or unknown use case so I will be coming how how this architecture seems to perform on that particular section in the later half of this now so gpt3 again CPT is nothing but a modified Transformer architecture right so it is so robust that is having a huge lot of databases more than 780 million parameters that we tell us the 780 million features are getting processed to train gpt3 till that though it is every day modifying itself because the type of AI that cpt3 is dealing with it is called as a adaptive AI now many of you might think what do you mean by adaptive AI so adaptive AI is something like it generally happens in our daily life suppose you are a student and you just appeared for an examination and an examination you had solved two questions and two questions you cannot solve it and obviously you know like I I don't have a proper knowledge with these two questions so think about the scenario either you have solved the two questions or you have solved the two questions but the solution was wrong or you cannot solve so your brain tell okay I will go back to my home after this examination and what I will do is I will again thoroughly revise this concept and make it correct right so see the use case is so specific you know your error so this is my error point I am specifically rectifying my error right so this is what the Adaptive AI is called adaptive a means the AI every day understands what is the false negatives what is my false positives and where is the problem and it gets continuously it is actually continuously learning on that false parameters to improve itself every day so open AI is playing in this domain of the Adaptive AI so yes it is called as purchase also you can tell as mostly modified self learning aisl and this is actually what the future is speaking right AI model can understand its error by its own so this is one of the one of the flavors of gpt3 okay so again if it is 3 is a robust base of this kind of Transformer architecture okay so cool so when opening I tried to play with this kind of symmetically architecture so one of the common thing that they've identified is called as the text generation so now I will be coming to this section of an NLP so this domain of the NLP is very interesting because whenever we speak about NLP it's all about text and a verb text and audios right so it is very it is literally very difficult to globalize this domain right so literally NLP have several kind of sub domains like you can do text classification that examples like sentiment analysis blah blah blah is actually present in the market for a longer time you can do detection as well right like this sentiment analysis problem can be solved by this both both type of attack by classification or detection anywhere like this but yes you can solve it like detection domain classification domain so there are some kind of this type of applications as well storytelling by AI it's like you had given a prompt to AI model and AI model is writing the next lines okay so this is called as a generative AI okay so gpt3 globalized this entire thing within a single single domain that is called text generation so text so you might say like okay I can understand this your the use case that you told right now is storytelling means like you are telling the AI hey give me a nighttime story so AI is giving you a nighttime story or if you are given a prompt RAM and Sham was playing football dot dot dot dot please AI complete my story so a I can generate so this is all about generative AI generating texts okay so this makes sense if I say about text generation now you might come up with a use case so yeah so how can you tell if I give AI a sentence suppose hey you can't do nothing in life and I give this sentence to Ai and I ask AI what kind of sentiment this sentence is having how can you fit this into a text generation algorithm how can you tell it this is obviously either a classification or detection domain problem right so how can you tell what you are generating over here so yes this can be equally created to a text generation algorithm so if you see about a sequence like suppose you understand this entire text to be a sequence a mathematical sequence so enter text is actually visualized by an AI model by a mathematical sequence or Series right a lot of series through which we used to deal with this there are several type of a series of tokens okay so every text is being comprised with some sort of tokens or values and that tokens is actually being array install organized in a form of a series there are multiple type of Series in mathematics or in AI Hamilton series Taylor series blah blah blah I'm not going to that kind of deeper mathematics but yeah a series is formed by that particular tokens or even the vector values right so again so what you are doing so your input is a series and you're generating output which is also a series of values or token right so what does the model is doing model is understanding one series and giving in an output another Series so you will be telling so another series means if I tell you a sentence like you can't do anything in life so this is a series of tokens or series of vectors like so series means suppose if you considered as each and every single character to be a mathematical figure so it is coming into a series so it is having a series of input so AI is processing this input and then in return it is what it is giving is is another series of value or series of characters so the entire globalization is being done is like the text the basic hypothesis behind this text generation algorithm is treat every single input or every single text as the series of the vectors or the series of the tokens so gpd3 works in the tokens mostly the series of the tokens so that you can easily fit any and every problem statement robustly within a single algorithm I guess it makes sense so if you guys have any doubt you can ask in the middle of the session or I will address every doubts at the end of the session as well globalized this entire NLP industry into one single problem statement that is a text generation we had played only on the Series game of the features okay so before coming into that kind of architecture level understandings because it is a little bit boring to some people or interesting to some people debatable point so right let us see what are the present use cases of the gpt3 and how the how the evolution took place so right now the most used use cases of the gpt3 is translation of the code that is called as a code automation so this open AI gpt3 is so much powerful that it can write code for you that means uh if you are given a problem statement so what happens is suppose you are a competitive programmer and you're solving a problem at code forces or lead code or Geeks for gigs any places so specifically I will tell for lead code because I had an amount of experience that I had dealed with the GPT 3 and I had interacted with the fourth and everyone it can easily solve each and every lead code question hardly there might be one or two question it cannot solve right hardly because I I mostly feel that is mostly trained on the lead code submissions and the solutions so it's my observation I don't know uh so okay so what you can do is so you can give a problem to gpt3 to solve it might be a lead code problem it might be a Geeks for gigs problem it might be might be a code forces a b or c problem though it fails in the D problem code forces C and B is the ambiguous case for gptp right now it will exactly write a code for you it will solve the problem for you and you can easily copy and paste it to that particular playground and your test cases will be getting passed so that's the game so writing of the code the biggest example if you can see is GitHub co-pilot every one of us know right so another another flavors of the code automation that gpt3 had added is translation of the code a common problem in a developer or a programmer's life is like I will say like suppose you are a python programmer but for your company you have to write some code in SQL or go right or might mean Scala or called right for the Pearl I I must say in my last to last company I was a dedicated python programmer though I play with r and Matlab as well but I am not so much scalable with this kind of Ruby and for okay so because we generally don't use this a lot for a deep level of machine learning but they had some kind of specific use cases so either to learn the programming language from scratch to G transform your entire code right that means write your code in Python send this code to gpt3 gpt3 will be transforming the code into any language you want any language so that is the code transformation okay I will be also giving you some sort of real-time Demos in this session so another another big example is that the gpt3 had achieved this apart from code translation is explanation of a code most of the times so basically most of the time most of the times what it happens is we the developers used to be the developers or even programmers used to look after us a lot of code in the market that means suppose I have to make a app and I'm looking for someone else code the most of the time is accountable to understand what the person had written because every code is not a production ready code it have a lot of assumptions might repeat the variability relations are not in the proper format it's like the code is not readable so what gpd3 will help you just paste that code in gpd3 and it will be explaining the entire code in any language you want you might want explanation in English it will be explaining in a natural human level English if you want to be under if you want it to be understood in uh Japanese it will be giving you an explanation in Japanese Chinese even in my Bengali it can generate you out so this is the power of code explanations so some other explanations or some other amazing things that I had personally achieved on my company or personally as if with the power of jptc that is Newton is cool creating school is also playing an amazing game by the by using the power of jpt3 and this kind of the open AI so the major problem with the noob level programmers is they can't able to understand what is the time and space complexity right of your code because at initial Days Every programmer struggles out with this so we automated this process by the help of GPT 3 at Newton school so Newton School revolutionaries this process like write your code whatever code you one and just send to us now so what gpd3 algorithm what it will do is it will tell that see this is your time and space complexity and this is actual time and space complexity so and this is the way you can modify your time and space complexities of your particular code that you have specifically written so it will help a new level programmer to a very quick guides to modify your given optimal solution of his or her code and other things like we were able to generate hints for any programming problems that means whatever programming problems you can come up with and we'll be giving you some kind of subjective hints to you that means hey you can go with this kind of so you can go with this kind of recursive power like a recursive algorithm or you might use hash maps and something like this okay so this is all about the code automation so what are the things that open aipch are being achieved of is something like you can have a conversation with gpt3 so ultimate lonely people have someone like someone AI teams do we interact with so gpt3 can talk with you gpt3 can take your interview a GPT or Yuma can be an interviewer and cpt3 can answer your questions so we are we are also revolutionalizing these doubt solving by customizing the gpd3 for our use cases at Newton School means like any subjective doubts or any kind of theoretical or conceptual doubts can be solved by the power of gpt3 suppose you came with K meta question I can't able to understand this particular function in merge sort you ask this question to jpt3 which will be answering your question easily within some seconds okay so this is called as conversation AI that gpt3 had modified and revolutionarized gpt3 can write poetry GPT speak and write stories gpt3 can generate artificial voices GPT 3 can generate artificial music songs for you so just type the query generate this song for me by the power of AI jpd3 will be generating you out even gpd3 right now some common examples I can see some of the researchers at hugging face I don't know I can't recall from Russia or Italy I can't able to understand right now so day guys had generated a complete script a movie script by jpt3 they just did it in a prompt to gpt3 here we want to direct a movie the basic outline plan is this can you just write a complete script for this the gpt3 had written a complete script for that particular movie okay so that movie is also equally available on YouTube anyone can go and check it out okay so apart from this question answering conversation generation the gpd3 can able to generate your designs for you like it had all also there is a two level at pigma or at Adobe actually called as anima anima is also using jpt3 being a designer just go and write a prompt okay I want this and this and this design it can generate that kind of particular designs for you it can it can generate a mock-ups just by simple prompts so it is trying to make life so easy just to know English go to gpd3 write the prompts it will be getting generated everything these are the common industry level examples that Peeps are doing we are doing by the power of jpt3 cool so let's get started with the next part of the session so how now let us understand how this gbt3 works and how you can customize your gpdp model or algorithm right so okay so generally when this GPT 3 got this bird so GPT 3 is not the first thing that this open aipch is doing so I will say I guess every one of you know about Dali 2 engine right so Dali 2 is a famous image generation engine right now though it have a lot of competitors right now Google had launched it so and several other peeps had launched their own alone but the Ali 2 is the first version of the Revolution I'm just giving you the history of Nally too so that the father of Dali 2 the person who made Dali 2 is Paul Boris the Paul Boris is actually doing this experiment since last nine to ten years since there is a platform called as crayon.com right okay it is also available right now at Google so crayon was the first platform made by the Fall body some years back and from that time he's collecting the data to make the most advanced image generation engine at this era right the same thing goes for gpt3 jpt3 have the previous versions of General GPT GPT J GPT Neo gpt2 several other other and other variations as well these are the most common variations CBT J Neo and gpt2 so this all at the common experiments that disputes rain uh like ran in the market for the longer amount of time and they had collected a huge sort of the database and then they were able to make this much of a robust engine so on a very basic understanding it's nothing but a common crawl from all over internet and if you if you tell me what is gtt3 it is the most advanced search engine if you tell gpt3 go and print a code from lead code for this particular problem it will be directly exactly printing that code for you so it is the most advanced search engine you might tell okay so this is how the data collection stuff happened for the gpd3 and something like this okay [Music] so there are several other other Alternatives of the gpd3 is also coming in the market just for your food for your interest so since open AI codecs or open a gpt3 is paid or you might need some sort of special accesses to access it for free for one year so this open source speeds like hugging phase Transformers and this kind of people is also making Alternatives of gpt3 so it is one of the another alternative that is being made by hugging face Pips so that is also a type of a gpd3 that is called blooms engine so Bloom's engine is also a llm but it is free to use any one of you can use it and this also have same kind of applications that jquitz we used to provide that is the core Automation and something like this so the results are not so much impression like the recipt 3 by open AI but yes so but it is also equally good like we can give it a scale to that and apart from this Facebook have a very very old experiment with this apt3 called as opt engines so opt but it still can't able to achieve a lot of Sota even the feeds from IBM had a lot of they did a lot of experiments before this birth of this cpt3 something called as the code parrot or code net experiments by IBM so they had also did this kind of things for code automation but they were also not able to be successful the major reason is the architecture so the architecture that the openai peeps had written is the most advanced thing that revolutionized everything so let us understand the architecture that the gpt3 is following okay so so yes uh so some papers have a very debate over here so why is it 53 research and they're not a search engine yes obviously gpt3 is not a search engine but I will tell why you can make a hypothesis accordingly by the end of the session I will specifically at addressing the doubts why you can give a hypothesis of a search engine to gpd3 uh I will show some kind of use cases to you guys okay so okay so I'm just uh I'm just taking example from Summer Stock over here because to explain the architecture stuff uh so this is a PPT from one of the Stanford researcher so I'm just taking it out to just understand some sort of mathematical Concepts that uh why this llms are so robust and why this llms are revolutionized everything they had side back the other architecture called as the opt blooms and like this okay so what is the game that these peeps used to play in this architecture that means is they have a huge probabilistic difference for each and every feature so what does this probabilistic difference means is all about so so you are speaking a sentence to someone right so that means I am at this particular time I'm just giving this session to you peeps okay so in this session I am telling every sentence by some sort of thinking that today I cannot tell you without a thing without a thought I cannot tell you a jargon that you guys can't able to understand so we humans whenever we speak a word we we think about what to speak and what not to speak okay so till that AI had this power but only AI used to control this power with some sort of the threshold values or some sort of the probabilistic values only nothing else right so the first level game or first level five order given three did on this is used to assign probabilistic score for each and every word or it's not even a word each and every token that GPT through use predicts out okay so means like it have a general inference of what he had already spoken what the user had already spoken that means gpt3 always needs an input prompt okay so what does this do from an input prompt it actually understand the entire context of what the user is wanting what is the basic or background context behind that based on it it assigns some sort of probabilistic matrices or some some sort of probabilistic weights to each and every values of the prediction okay so for that reason jptc doesn't predict something that is not related to the central context so many of you might be thinking like how it gets uh it's context of this kind of things or what you understand by the context so it's all the context stuff is all about I'm not going to going deeper mathematical concepts of it I'm just giving some kind of basic ideologies of it the context game how this kind of the basic AI architecture generates a context is entire thing is a mathematical Matrix at the back end of a AI model and every mathematical Matrix have some kind of a central pivoted value if you come from a statistical background you can understand you can have a data central tendency you can play with the data variance game you can have the data central weight theorems so basically every data have their Central Point values and the Central Point values actually have some kind of a unique figures and at unique figures of that unique Matrix representations is to tell what the data is all about okay so from that particular unique numerical values or numerical factors easily it is considered to be as one of the context Vector so we assign the probability now here the probabilistic thing is not about coming from Central base theorem it is coming from some kind of conditional probabilistic logic so basically why it is called conditional probability because the conditional probability gives you a very good elaborative context at which rate my matrices to take which context vectors to take and which not to take so that you can twist your probability according to your predictions so on a very level of a very Layman's language it assigns probabilistic score for each and every prediction so that which is completely related to your context so that any value that is being predicted by gpt3 is not coming into some kind of false positive okay so cool so this is a very basic ideology of the probabilistic game of the jpt3 and how this architecture used to work so now come let us see the neural network architecture of this particular engine so what does this cpt3 used to do is a gpt3 used to do a lot of uh yes George uh um Google actually has the same kind of a model the closest or Superior yes you are absolutely correct uh I'm just giving you some kind of my experience at Google there so yes Google have more and more advanced architecture to download of these GPT 3 engines and even since they within a day they had made an alternative to Dali but there are also so let us get back to the talk okay so let us see the new neural network architecture of the so let us get back to the screen neural network architecture of this gpt3 engine so if you see this architecture those peeps who had already into Ai and already have seen some sort of this Transformer architecture so this type of a building is not so much so much New To You peeps but yes so where is the twist is the twist is all about into the Transformer block or in the Transformer architecture so what is the game what is the game it actually plays at the back end of it it takes the context vectors of each and every inputs or each and every data you had passed to this particular model and it do not a bulk processing at a single amount of the time that means so what what do you mean by a bulk processing is whenever you play about the trivial neural network architectures like if you say about a trivial CNN architecture or trivial rest set 101 architectures so in this architecture what happens in general is which would mostly the engineers focus on uh getting more and more contexted features or that kind of the features that is actually useful for the computation but you now you've got the bulk of the features now how will you do that kind of feature learning thing so this GPT three Pips had uh mostly considered this kind of place so basically they are not doing a bulk processing of that particular tool of the features so I might tell you mathematically so there is one of the concept of uh centroids and Logics uh by Frederick Eric Eric anyone can find it out over Google the concept of the centroids and Logics by Frederick you you usually speaks out is uh something like uh you guys have a pool of the Matrix values or some kind of a huge Matrix features or x amount of suppose X or n amount of the Matrix features so if you consider the entire structure to be a centroid or a hollow centroid structure so basically you might think like suppose or just understand like you have n number of values and you are trying to align these particular values within a structure and obviously some value will be at a central point and some value will be very far away of it okay so basically if you try to trace out one value to another value or suppose you are trying to calculate any kind of distance between the values to have some sort of context about the orientations that will be a big anomaly because you can't able to try properly pivot a lot of features within a single time which feature is important which feature needs to be first processed which feature needs to be last processed or which feature plays an important role while drag addiction which is not playing an important role-wise direction right so in generally if you see a matrix level understanding of a neural net and generally our trivial neural networks always make a structure at inside to that particular inside the computation so what does this compute there are several kind of structure models is being proposed by a lot of mathematicians and not going into the structural models but the famous structure models that we always call as a circular structure or something called as a centroid or haloids and Android structure something like this but in every case some features are always coming out of this neural network computation by some sort of dropouts and every kind of thing that might be an important feature so gpt3 said like okay in this architecture we are are ready to increase the blocks of the processing but we are not ready to lose our features we can't able to understand which one is important or not so basically they made a l cross 12 repeated blocks of the Transformer Network means like it have suppose you can tell n cross 12 amount of the Transformer blocks in each Transformer blocks the features are getting prioritized means like the first bit of the features that is the highest priority feature so they used to also maintain some sort of a priority queue you guys might be curious how how now how they maintain a priority queue these priority queues are being maintained Again by calculating the numerical values of each and every features and that numerical values are coming from a logic of some called something called as a lock something called as a LaPlace coefficient so this LaPlace Cohen coefficient used to tell like the Matrix values which is having the highest weight should always have the highest priority if you see if you read the actual research paper of the gpt3 you will be having the having a proper understanding of this LaPlace coefficient LaPlace coefficient so I'm not going into that kind of deeper understanding but simply you are maintaining a priority queue and in the priority keyword of trust prioritized features will come first and getting processed individually in each Transformer block then the second then the third so that the model have a proper level sensitive understanding for each and every architectures okay so this is how where is the twist is playing in the gpt3 architectures okay so what I had already explained about what is the difference in between the CNN architecture and general gpt3 block a CNN is a straightforward block with a single block game and have the multiple Transformer blocks playing with it so the last but not the least important twist that the gpt3 have is called as the mask self attention so there are every one of you those who are against since many of the peeps are already understanding this session so I guess everyone is familiar with this attention Networks so attention architectures or self-attention architectures is being already common into the market it's not a very new thing but there is something called as the masked self-attention so what is the mask self-attention is making the GPT a robust model so what does this means like suppose your model is again when when the model is completely trained and now it is having a feature a complete pool of features okay so that particular pool of features have several kind of features means like it might a set of it might having a feature set for classification specifically a set of it for Generation specifically or set of it for any other task so if you are giving a query Transformer code which feature to selected or which feature not to be selected or which pool to be calculating or which pool not to be calculated so this game is being handled by The Mask itself attention I'm not going into so much deep of it we have only a very lot less amount of the time uh and I have to show you some sort of demos as well so good so uh okay so uh cool uh cool so now coming to the next uh part of this talk so I had already explained what are the several kind of the things that the unique things that are gptc architecture is having inside it and also a general so since they had a few short Learners so basically they can easily understand each and every context it features at a very easy level depth Okay so now when it comes with a specific query of this particular thing uh uh just a second let me show you some kind of uh demo from an open ai's website and do I have the access and then I will be coming with our examples and then I will be jumping into your queries cool so what I was telling is all about key because jpt3 is all about a few short Learners and since it is a few short learner so it have an easy context of each and every minute level features so now what does this gpt3 is playing all about the GPT 3 let us understand some any kind of example so you can see open AI is multiple example over it right there are X number of examples that open a had been right now created so suppose let me see any kind of code level explanation kind of stuff so cool so we have an extra so we have something called as the code to natural language one of the example that I have told right now to you peeps so here we have a common code Regal and uh I guess everyone can see my screen and see the code equally well so I'm just browsing it out uh so okay so we have a general code in Python so if you want to generate explanation about this particular code so gpt3 had multiple variations of the model so the most advanced model that plays for the place for this kind of architectures or code level automations or any kind of code automation is gpt3 codecs so and for any kind of text automations or text generation is something called as the gpd3 text DaVinci right so let us see what the what kind of explanation it generates foreign and let us see how much good the explanation is so it tells okay my function takes three arguments right and this is a data frame okay and the column is named as the completion okay nice so you have till it had told nicely what the entire code is doing now many many of you peeps might be thinking so it is a pre-trained code you have already written the code it is doing well so let us take any example from code forces so so I'm just copying my code uh I'm just taking any of the example of code from himself okay so until unless that site gets on oh I guess cool so I'm just manually I'm just taking any submissions of him any XYZ any any submissions updated so okay so let us see so let's code that's not in Python right now okay so let us understand yeah and after right we have to always specify which kind of programming language we are dealing with and let us I'm just not telling about completion I have to change my prompt okay so see so sometime before sometime before someone is telling like uh gpt3 is not a modified search engine so it can tell well from where I am copying the code so this is that this is the reason why it is my hypothesis it can it is actually searching all over the internet right it's not a real time search it is all about a search from a database okay so it is writing an explanation of this code so this engine is so robust it can generate anything out of it so it is giving a lot of explanation so if I tell calculate uh calculate uh time complexity of the gold above okay so let us see what does the GB literally gives there is a game okay so it is giving it is giving a Time complexity of this particular code if you want to change this query to space complexity it will be returning out the space complexity of that particular code as well so this is a very quick demo about a gpt3 engine you guys might be curious like how to get access of this models so you go to open Ai and you guys can just give tell whatever use cases you have if that use cases sounds amazing you okay someone had given a I'm just testing it out cool so you want explanation or something complexity or transformation what kind of use case you want what kind of uh just a second what kind of use case you want this is a Java code yes so okay just an explanation is cool so you need some sort of explanation of it so I don't know why my search is not working thankful just a second let me search it for dog because I'm just typing out the code I don't know why my search is not taking this thing from Zoom chats all right [Music] cool so let's see what gpt3 gives out I'll not be surprised if it is getting filled just you have to rewrite the query because it's taking some sort of glitches foreign we will be soon uh launching like if you want to have the feel of this kind of engines so the open source things that you can use right now is repel it codex so rapid is a compiler so they had already introduced gpt3 in their India coding engines a lot of people is using it out uh even if you have GitHub student account you can easily and equal or if you're an open source maintainer at any repository you can go and freely use copilot another platform is called hugging face Transformer Bloom small model you can use and soon in the market we are also going to give out this kind of gpd3 use cases from the power of Newton School we are using a lot of places free so many of you might be thinking how to customize this model so there are General two ways to customize this model is just a quick part then I will be jumping into the Q a okay so you can go directly with the documentation of an open AI so the documentation of an open AI directly gives you an access to customize on the fine tune the gpt3 engine so whatever you have to do is you have to collect a heck lot of data and you can either annotate the data or you can directly pass that data this kind of engines have the self annotating power as well and simply pass the data by following the steps so this is the most trivial way to just to do fine tuning if you don't want to do this type of fine tuning you want to write code for it so what you can do is you can add some kind of additional layers by going to the code base of this gpt3 engines and you can write additional subsequent layers to this models so that you can modify it for your own use cases cool so okay so this is all about the power of gpd3 about the use cases we are doing at Newton in school or my yourself we are experimenting it out and we are actually using this open Ai and gpd3 a lot to revolutionarize the eight Tech from interview automation to code automation to uh giving this kind of a pointed analysis to a quote holder here you are writing this wrong you should do this in this way so so that your code should be optimal and a lot of places so cool so let us jump into some kind of q a okay so okay so what is gpd3 is already answered what is a paradox so nice question what is a paradox so uh if you are if you if you had studied about Paradox I if you had studied about actions if you had before studying the geometric theorems the actions are something that is called as like okay we had taken it granted something like this after a lot of observations the same thing goes for the Paradox as well so data set is freely available for chargeable at a month will it probably later data sets data sets Okay so what are your point so it's not a data set like I guess you might be asking about the gpt3 model so sgpt3 models are not so much costly it is 0.0012 dollars per thousand token 1024 tokens so this 1024 tokens are near about to 750 words something like this and that is very minimal uh it takes that when you see okay you can directly login into the jpg3 website and you can have a one month of free trial as well without giving you any kind of cards but yes so this the general architecture that this kind of open AI Pips is doing it's not going to be open source in the near future as well because this is the revolution so if they will have a lot of Revenue about it okay and of also the gpd3 is not going to be also freely accessible for everyone in the Indus everyone in Industry specifically not for the students specifically not who are actually practicing because these things can be used for a lot of cheatings and something like this so already open AI I had talked with some kind of fellows at open area are already working on this kind of privacy policies so it's yes it will be accessible for professionals industrial leaders developers businessmans everyone can use it but you have to be specifically variated how to do gpt3 setup for a specific domain how to put the domains data included the database okay so okay this is a nice question so okay so see GPT 3 setup means uh so if you have the access of the gpd3 it might be codec so it might be text obviously anything or might be a free access so the first thing is understand what use case you want to build and first come to playground and try with several queries see the major game in the gpt3 is you have to understand which query it understands because many queries it cannot understand okay we will return you the jargon understand the query so I'm just going back to the playground there are innumerable ways to integrate gpt3 I'm just giving you all so I'm just taking any example for a quick suppose this is an example of parts and unstructured data so suppose this particular query is treating your need okay so it this open aip's have give you some kind of API level code for this so this is a high level API level code so now you can directly integrate this API with your use case or your application or your website something like this okay so you can modify this individual queries with some sort of automation so this is one way to directly integrate the opening or directly integrate it through API this is one of the ways wh

Original Description

The world of LLM models is dominating the technology life’s of industries. Various cutting-edge projects are executed in the industry (past / ongoing and futuristic ) using Large language models GPT3. In this DataHour Anustup will cover all about LLM starting from defining fundamental definitions of LLM and highlighting important mathematical understanding to explaining real-life examples.He will also explain the real life project based on customizing GPT3, Learning path for LLM Models and finally how you can kick start your career in AI with opportunities. Prerequisites:Understanding of Python programming and interest in AI with good fundamental math 🔗 More action pack session here: https://datahack.analyticsvidhya.com/contest/all/ Stay on top of your industry by interacting with us on our social channels: Follow us on Instagram: https://www.instagram.com/analytics_vidhya/ Like us on Facebook: https://www.facebook.com/AnalyticsVidhya/ Follow us on Twitter: https://twitter.com/AnalyticsVidhya Follow us on LinkedIn:https://www.linkedin.com/company/analytics-vidhya

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Analytics Vidhya · Analytics Vidhya · 33 of 60

← Previous Next →

The DataHour: Data Science in Retail

The DataHour: Data Science in Retail

Analytics Vidhya

The DataHour: Anomaly detection using NLP and Predictive Modeling

The DataHour: Anomaly detection using NLP and Predictive Modeling

Analytics Vidhya

The DataHour: Energy Data Science Project from Scratch

The DataHour: Energy Data Science Project from Scratch

Analytics Vidhya

The DataHour: Explainable AI Need and Implementation

The DataHour: Explainable AI Need and Implementation

Analytics Vidhya

The DataHour: Google Cloud AI/ML

The DataHour: Google Cloud AI/ML

Analytics Vidhya

Prediction to Production in Machine Learning #machinelearning #prediction

Prediction to Production in Machine Learning #machinelearning #prediction

Analytics Vidhya

Practical Applications of Data science in Ecommerce

Practical Applications of Data science in Ecommerce

Analytics Vidhya

How to tackle Overfitting?#machinelearning #overfitting

How to tackle Overfitting?#machinelearning #overfitting

Analytics Vidhya

Building Data Pipelines on GCP #googlecloud #datapipelines #data

Building Data Pipelines on GCP #googlecloud #datapipelines #data

Analytics Vidhya

Hands-on with A/B Testing #abtesting #datascience

Hands-on with A/B Testing #abtesting #datascience

Analytics Vidhya

Efficient Implementations of Transformers #transformers #cnn #machinelearning

Efficient Implementations of Transformers #transformers #cnn #machinelearning

Analytics Vidhya

Modern Deep Learning Architecture #deeplearning #architecture #deeplearningtutorial

Modern Deep Learning Architecture #deeplearning #architecture #deeplearningtutorial

Analytics Vidhya

Key steps for Designing Artificial Neural Network (ANN) for Image classification #machinelearning

Key steps for Designing Artificial Neural Network (ANN) for Image classification #machinelearning

Analytics Vidhya

5 things you should know about Azure SQL #azure #sql #datahour #datascience

5 things you should know about Azure SQL #azure #sql #datahour #datascience

Analytics Vidhya

AI & ML in the Automotive Industry #machinelearning #ai

AI & ML in the Automotive Industry #machinelearning #ai

Analytics Vidhya

Building Machine Learning Models in BigQuery

Building Machine Learning Models in BigQuery

Analytics Vidhya

NLP aspects in Telecommunication Industry

NLP aspects in Telecommunication Industry

Analytics Vidhya

Practical Time Series Analysis

Practical Time Series Analysis

Analytics Vidhya

Fundamentals of Quantum Computing

Fundamentals of Quantum Computing

Analytics Vidhya

A DAY IN THE LIFE of a Data Scientist (From waking up to working on algorithms)

A DAY IN THE LIFE of a Data Scientist (From waking up to working on algorithms)

Analytics Vidhya

Classification Machine Learning Model from Scratch

Classification Machine Learning Model from Scratch

Analytics Vidhya

Knowledge Graph Solutions using Neo4j

Knowledge Graph Solutions using Neo4j

Analytics Vidhya

Model Guesstimation (MLOps)

Model Guesstimation (MLOps)

Analytics Vidhya

ETL Pipelines in Google Cloud Platform

ETL Pipelines in Google Cloud Platform

Analytics Vidhya

Key steps for Designing Convolutional Neural Network(CNN) for Image Classification

Key steps for Designing Convolutional Neural Network(CNN) for Image Classification

Analytics Vidhya

Getting Started with AWS EC2 #amazon #aws

Getting Started with AWS EC2 #amazon #aws

Analytics Vidhya

How to Use Azure NLP and Graph Databases for Intelligent Knowledge Mining

How to Use Azure NLP and Graph Databases for Intelligent Knowledge Mining

Analytics Vidhya

Certified AI & ML BlackBelt Plus Program #shorts

Certified AI & ML BlackBelt Plus Program #shorts

Analytics Vidhya

Visualizing Data using Python #machinelearning #visualization #python

Visualizing Data using Python #machinelearning #visualization #python

Analytics Vidhya

DCNN for Machine RUL Prediction using Time-series Data #timeseries #machinelearning #datascience

DCNN for Machine RUL Prediction using Time-series Data #timeseries #machinelearning #datascience

Analytics Vidhya

M in ML stands for Math & Magic

M in ML stands for Math & Magic

Analytics Vidhya

An Unsupervised ML approach using Clustering

An Unsupervised ML approach using Clustering

Analytics Vidhya

Customizing Large Language Models GPT3 for Real-life Use Cases #gpt3 #datascience

Customizing Large Language Models GPT3 for Real-life Use Cases #gpt3 #datascience

Analytics Vidhya

Model Parameters vs Hyperparameters - Techniques in ML Engineering #machinelearning

Model Parameters vs Hyperparameters - Techniques in ML Engineering #machinelearning

Analytics Vidhya

Practical MLOps #mlops #datascience

Practical MLOps #mlops #datascience

Analytics Vidhya

Data Engineering with Databricks #dataengineering #databricks

Data Engineering with Databricks #dataengineering #databricks

Analytics Vidhya

Multi-Objective Optimisation

Multi-Objective Optimisation

Analytics Vidhya

When Airflow Meets Kubernetes

When Airflow Meets Kubernetes

Analytics Vidhya

Analytics Vidhya

Learn Convolutional Neural Network for Image Recognition

Learn Convolutional Neural Network for Image Recognition

Analytics Vidhya

Extracting Value from Data

Extracting Value from Data

Analytics Vidhya

How to measure Marketing Channel Effectiveness

How to measure Marketing Channel Effectiveness

Analytics Vidhya

Transforming Lives | Data Science Immersive Bootcamp

Transforming Lives | Data Science Immersive Bootcamp

Analytics Vidhya

Stock Market Analysis - AI driven approach

Stock Market Analysis - AI driven approach

Analytics Vidhya

Become a Data Engineering Professional in 2022 | Future Trends + Skills Required

Become a Data Engineering Professional in 2022 | Future Trends + Skills Required

Analytics Vidhya

Ensemble Techniques in Machine Learning #machinelearning #ensemble #datascience

Ensemble Techniques in Machine Learning #machinelearning #ensemble #datascience

Analytics Vidhya

The Power of Visualization | Tableau Full Course | Analytics Vidhya

The Power of Visualization | Tableau Full Course | Analytics Vidhya

Analytics Vidhya

Demand for Data Engineers is on the Rise | Data Engineer | Analytics Vidhya

Demand for Data Engineers is on the Rise | Data Engineer | Analytics Vidhya

Analytics Vidhya

Data Visualization in Data Science | DataHour | Analytics Vidhya

Data Visualization in Data Science | DataHour | Analytics Vidhya

Analytics Vidhya

Role of Optimization in Machine Learning & Deep Learning | DataHour | Analytics Vidhya

Role of Optimization in Machine Learning & Deep Learning | DataHour | Analytics Vidhya

Analytics Vidhya

Solving any Machine Learning Problem | Approach and Steps Involved

Solving any Machine Learning Problem | Approach and Steps Involved

Analytics Vidhya

Topic Modeling Explained with Implementation | Using LDA in Python | DataHour by Arpendu Ganguly

Topic Modeling Explained with Implementation | Using LDA in Python | DataHour by Arpendu Ganguly

Analytics Vidhya

Data Engineering in E-Commerce | The Best Case Study

Data Engineering in E-Commerce | The Best Case Study

Analytics Vidhya

Introduction to Classification using Azure Machine Learning | DataHour | Analytics Vidhya

Introduction to Classification using Azure Machine Learning | DataHour | Analytics Vidhya

Analytics Vidhya

Introduction to Federated Learning | DataHour | Analytics Vidhya

Introduction to Federated Learning | DataHour | Analytics Vidhya

Analytics Vidhya

Diffusion Models for Generative Arts | DataHour | Analytics Vidhya

Diffusion Models for Generative Arts | DataHour | Analytics Vidhya

Analytics Vidhya

Master Google Analytics in 1 Hour | DataHour | Analytics Vidhya

Master Google Analytics in 1 Hour | DataHour | Analytics Vidhya

Analytics Vidhya

Learn Hypothesis Testing | DataHour | Analytics Vidhya

Learn Hypothesis Testing | DataHour | Analytics Vidhya

Analytics Vidhya

A Practical Approach to Kaggle Competition | DataHour | Analytics Vidhya

A Practical Approach to Kaggle Competition | DataHour | Analytics Vidhya

Analytics Vidhya

Making AI work for Business | DataHour | Analytics Vidhya

Making AI work for Business | DataHour | Analytics Vidhya

Analytics Vidhya

This video teaches how to customize GPT3 for real-life use cases, covering its architecture, applications, and fine-tuning, and provides practical examples and tools for implementation. By watching this video, viewers can learn how to build custom large language models, fine-tune GPT3 for specific use cases, and apply GPT3 to real-life applications.

Key Takeaways

Install and set up GPT3 using OpenAI
Fine-tune GPT3 for specific tasks using custom datasets
Apply GPT3 to real-life applications such as code automation and text generation
Use tools like Hugging Face and TensorFlow to optimize GPT3 performance
Evaluate and improve GPT3 response quality using effective prompts

💡 GPT3 can be customized and fine-tuned for specific use cases, allowing for more accurate and effective applications in real-life scenarios.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Claude AI vs ChatGPT: Which One Is Actually Better in 2026?

Compare Claude AI and ChatGPT based on real-world usage and benchmarking to determine which one is better in 2026

Claude AI vs ChatGPT: Which One Is Actually Better in 2026?

Compare Claude AI and ChatGPT to determine which AI model is better for your needs in 2026

Medium · Programming

IntelliBooks: Classic RAG vs Graph RAG vs Agentic RAG – Choosing the Right AI Retrieval Architecture for Enterprise AI

Learn to choose the right AI retrieval architecture for enterprise AI between Classic RAG, Graph RAG, and Agentic RAG

Fluid, natural voice translation with Gemini 3.5 Live Translate

Learn about Gemini 3.5 Live Translate, a new voice translation technology that enables fluid and natural conversations across languages

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)