Fine-tune ChatGPT w/ in-context learning ICL - Chain of Thought, AMA, reasoning & acting: ReAct

Discover AI · Beginner ·🧠 Large Language Models ·3y ago

Skills: LLM Foundations90%Fine-tuning LLMs80%Prompt Craft70%Advanced Prompting60%

Key Takeaways

The video discusses fine-tuning ChatGPT with in-context learning (ICL) using Chain of Thought, AMA, reasoning, and acting: ReAct, and explores prefix tuning as a lightweight alternative to fine-tuning for natural language generation. It covers various tools and techniques, including GPT-3, BioGPT, Hugging Face platform, and ReAct scheme.

Full Transcript

hello Community I hope you are impressed by the title but you know the truth is it is very easy today it is one of the simplest and one of the most complicated things I would like to show you why this video I want to show you if you do not have the money to run the fine tuning of GPT systems or you cannot afford find your own chat GPT or you cannot afford fine tune pom540b or whatever Auto regressive large language model you have there is a way that without spending tons of money actually you don't have to pay anything how to get the best results out of your pre-trained llm and it the answer is simple on the one side you have to give an intelligent input but it is something completely different what you might expect and with this input you get significant better results back without spending any US dollar at all so for this we have to do a little bit of theory now you know the fine-tuning the word fine tuning uh I have about 50 videos about coding fine tuning on bird models biobird date of birth or whatever you want and then on the other side of the Transformer on the decoder stack so to speak we have those GPT monsters gpt3 bio GPT jet GPT and now they also use the term fine tuning now what I would like you to understand and you are done one of not a lot of experts that fine tuning is not always what it is with cheap adhes system we can do prompting and for example I will show you with bio GPT we will do a prefix tuning officially or to be unprecise everybody even Microsoft calls it fine tuning but I wanted you understand if you have to code it it is not fine-turning it is something completely different and I would like to show you how to do it and give you the reason of the explanation why it is so gpd3 or jet2pt or whatever you want to call it there's the option of prompting now prompting means pre-pending Specific Instructions and a few examples to the task input and then you are generated the output from those language model which is significant better so the system learns with an intelligent input so gbd3 in particular uses manually design prompts to adapt its generation from different tasks and this framework has a specific term so when you hear in context learning or ICL you know it is about prompting input prompting prompt engineering the input to a GPT system now there's a limitation since Transformer and we have here now the decoder stack you know can only condition on a bounded length context where gpt3 it is defined with 2048 tokens in length so the in context learning so this is what I will retry with prompting is unable to fully exploit larger training sets training sets that are longer than the context windows so you have a quite hardcore limit on 2048 tokens for GPT 3 for this condition therefore and I showed you in one of my last video that bio GPT was now released by Microsoft on the hugging face platform the model they diverted a little bit and they used a recent development that is called prefix tuning now this is a lightweight alternative as they themself call it to fine-tuning for a natural language generation and we are here at autoregressive system so you have given the beginning of a sentence and you want the next word and the next word to be inserted by the machine and this this new methodology was inspired by prompting so you see you have the classical fine-tuning task on the bird system on the decoder and on the encoder you get prompting which was really nice as a new alternative and the latest state of the art if you want here uh biogpd just some weeks ago or days ago it was published on hugging face you have this pre-fixed tuning and even Microsoft calls it a fine tuning approach but you will see it is not but what we can learn is we understand the system and we can get results even without applying fine tuning if we have a clever input so here we go whatever we tune whatever we do we have to have data so all these models whatever there is they are pre-trained on a huge data set millions billions trillions quadrillions of documents beautiful if we do fine tuning we do it for specific Downstream task let's say today our Downstream task is question and answer beautiful now the first thing that you know gpt3 is for years now available so you have to open your API fine tune create and then everything and if you read really clearly here and please copyright information this conditional generation on openai you have your paraphrasing summarization entity extraction product description chat Bots whatever you can imagine what it is and they give you guidelines if you pay uh openai to fine tune the gbd3 model for example say you have to use certain separators you have to have an end separator they are aim at least for 500 examples and I show you exactly how the examples have to look like and they tell you hey to ensure that a prompt and the completion does not exceed the 2048 tokens including all separators so everything as I just showed you here we are bound by 2048 tokens here for LGBT suits for example this is a example for gpt3 so please check yourself I have here a platform on my eye dot guide fine tuning prepare your data set so this is the classical approach and here I would recommend going to uh this notebook here I leave here a collab research link fine tune gpd3 with whites and biases with w and B this is a really interesting page have a look at them I'm not sponsored by them not at all but use the resources that are available so and here you see for example what is the prompt and the completion structure we have here the prompt of prompt one let's say it's Livermore California the completion Livermore is a city in Alameda country California in the United States and beautiful next one Albany Woolen Mills also known as Western Australia Western and wool Mills limited was a wool Mill located in Western Australia and so you see you have a prompt and you want the system to complete this prompt with this learned information so rather easy to understand beautiful now this was gpd3 now let's look at biogibility you see here published an archive January 13 2023 Microsoft Microsoft Microsoft I had a particular video already on this but what I want to show you we need data for biogibility and I want to show you the data Microsoft used and this is a public domain data so it's called public medical q a and this is a biomedical question answering data set and as I told you we are today only in the question and answering part to make it easy understandable so in this public data set that is available you can download it you can play it you can use it you have a sample structure so you have a question a biomedical question what is the drug what is the biochemical structure what is the molecular structure how is the drug drug interaction whatever then you have an answer and from the scientific papers the abstract of a scientific paper you have additional information that's called a reference or just a context and then you want as it's an answering data set they go for a binary or tertiary answer yes no maybe this is what they call the label so you have question answer context and label in this structure where the data set is built up in exactly this particular pattern so and then you have to know two information to further information you have the source sequence and the target sequence and I will show you both so PubMed is of course the first question and answer data set where they tried reasoning over the biomedical research text especially the quantitative context yes yes yes but why we well Microsoft uses this it is the largest size of expert of human expert annotated yes no maybe question in the biomedical domain so we have here the human input human feedback the human evaluation of this data set so if you want to have a look of course there's a research paper please have a look here you're not going to believe that Google was involved in this but of course and here you have now one data set just to show you the structure and we need to understand this to generate better input so at first you have a question beautiful and then you have the context from somewhere an abstract scientific abstract they tell you blah blah blah yes beautiful and then they have what they call here a long answer so some were in a scientific paper there's the sentence I was studying indicates that the pre-operative state and therapy seems to reduce some development of some whatever so this is the conclusion because you ask is there something and they said yes and now from this long answer a human expert came down to a binary answer yes no oh that's right yes no maybe when this is exactly what you want but just to tell you have a look here a study indicates that the therapy seems Seems to reduce so I would have gone here before maybe but the expert decided to go with yes here in biomedical so beautiful so you see this is the data structure that we will now do the fine tuning on and as when you read the research paper Microsoft writes the fine tuning for Bio GPT for a downstream task remember we are always focused on the downstream task we have a question answering or a question and answering whatever you want to call it fine-tuning here is if you really are correct we have to tune with the code it is the wrong answer because it's not fine tuning let me show you why so again this is the text here from their publication and you see they have a source sequence and a Target sequence and now say they are methodology from January 2023 here to fine tune bio GPT Microsoft itself is a we prepare the description word question context and answer before the question context and answer and concatenate the string together as the source sequence so you see here question context beautiful so they have now a sequence of text the question itself that you want to know then in the fine tuning they provide you now the context from the scientific papers a summary or the whatever particular Parts particular chapters of the publication this is now that goes into the context and the answer is taken from a publication now for example the long answer the answer text this sentence here and then you have a Target sequence and you want this target sequence to be structured in a particular logic so you say the answer is answer text to the question question text given the specific content content text here is and now comes the binary classification problem if you want yes so this is more or less the input data For What Microsoft calls fine-tuning bio GPT but we now know that it is not a fine tuning but it is a prefix tuning because if you read their paper you come to the paragraph where they say what is absolutely correct during the inference run so after the pre-training after the fine tuning if they have some in the interference run we provide the source text and the prompt as the prefix for the language model to condition on and let the language model generate the target output so you see this prefix here in our sequence where you have question context and answer all these additional information you feed in your input to your query this is what they call prefix tuning and prefix tuning optimizes a task specifics prefix that applies to all instances of that particular task and this is why it is used as a advancement of fine tuning because we have Downstream tasks and here with an auto regressive system we have the beauty that this applies to all instances of of the task it is Task specific so prefix tuning can be applied to nlg tasks so if you now code this this means that the language model parameters remain fixed and just the prefix parameters of the system become now the only trainable parameters to give you a rough idea of Dimension if you say the whole model parameter is 100 I suppose as an idea the prefix parameter is about 0.1 percent so you see the amount of training you have to do is significantly smaller but you get similar results now of course this prefix tuning was not invented here by Microsoft but you have the original scientific publication here by Stanford University prefix tuning optimizing continuous prompts The Prompt input for generation for text generation have a look here at the archive preprint I can really recommend this paper just to give you an idea between the technical term fine-tuning that we apply here and the methodology of pre-fixed tuning that bio GPT by Microsoft was fine-tuned on it is the coding of a prefix tuning and you see here normal you have the Transformer with its layer and you have the fine tuning and everything is is in motion referencing will be altered all the weights will be included and so on for the task of a translation for the task of a summarization put a task of table to text or for the task of question and answering now with prefix tuning as I told you oh there's weights of of the model stay well if you want Frozen you have just this small Vector here in front this prefix and this is where the encoding the embedding is gonna happen so from this publication optimizes a small continuous task specific Vector here and this is called what they call the prefix so therefore prefix tuning it is a task specific Vector tuning based on your content of course now if you dive a little bit deeper in this paper you get an idea you get a little bit of mathematics that it tells you how does it work with the loss function and trainable parameters and the differences uh not for the moment I just want to give you the main messages despite learning 1000 few times fewer parameters than fine tuning the prefix tuning can maintain comparable performance and this is if you think about a theory amazing one thousand times fewer parameters of a model and you get the same performance if you just do some different tuning of the model so this is one of the advances here at the end of 2022 beginning of 2023 that also you should take advantage of because if you have to pay up mail for fine tuning and you do this methodology well you can calculate the difference there are a lot of other options coming up now you see here November 2022 ask me anything Amma is the short acronym this is a strategy for prompting language model and it was a very funny have a look at this paper they just say hey let's why just come up with one prompt let's generate I don't know a set of prompts and put all together and try which is the best prompt so you have so many possibilities to play with these systems so they collect multiple effect if yet imperfect prompts and aggregate them and this can lead to a highly quality prompting strategy so if you have no Financial limitations you can try experiment whatever you like so this was more or less here the green part we learned about fine tuning we learned about prompt engineering and we learned about this new technology that advances on fine tuning was inspired by prompt engineering and is now called prefix tuning of large language Model Auto regressive models and now comes the main part of the talk my goodness you have new options with this you can save a lot of money not paying a cloud provider to find you on this on your specific company data of your domain specific data but you can use something more intelligent and it is called em the chain of sort or react and I would like to show you this that you can apply it and you can save something so First Chain of Thought this is January 10 2023 Google research not coincidence um large language models so Auto regressive models chat GPT whatever comes out of Google in the next days weeks months I have no idea there is this beautiful Jane of sort prompting methodology let me show you how it works if you want to read the theory the mathematics I give it a link to the paper but just to show you what it is you have normally your standard prompting you have your model and you say you type in hey this is my question and you get an answer back by the system and then you have the next question and you get an answer back and now this is time the answer is wrong now they find out if you provide not this information these three lines and an answer in three lines and an answer but if you have here this is the input text now to this question so you give an example a complete example as an input to chat GPT or gpd3 or whatever you have you say hey Roger has this that says how many and then you also give the answer as an example and you give the chain of sort your reasoning your way your path of reasoning and you say okay so we start with this and then we put this and then is this is the result here first step and then we do this and then we alter it and then we get a result and given that the answer the structured answer here and you have a similar question now here as the next question you really want to have solved of course they have to have some conics here you can go over some completely crazy here you have to have some overlapping if you want you will get an output where the system now follows your I will not call it reasoning that is too much your path of augmentation your chain of sort how to come to this result and this is something beautiful so you can teach the system if I have this question I I want from you this kind of answer this is the way how you have to construct this answer for me and now I have my question the model really and this is the beauty of Auto regressive systems comes out and learns this here now there are huge a lot of question how is this possible why is this possible what's the mathematics how is the configuration of the system what are the parameters forget all about it read the paper just want to show you what you can achieve what you can do without paying extra if you see but what's the difference let's look here just at the yellow and the kind of orange here and this is the world of mathematics and you use yeah we know we have to use huge system so Palm 540 billion parameter system you have here if you use Palm the same system in yellow here and you have the standard prompting so you just have a what is I don't know three plus five and the solve rate will be just 18 percent although this model has been trained I don't know for weeks months centuries it is it is really on mathematics just this is the sole rate is is under proportional let's call it under proportional now the same system you don't have to pay for any fine tuning on mathematics you don't have to pay for any additional modules you don't have to pay for any solver packages that people sell on the internet if you just reformulate your input you can get from 80 sulf rate close to 60 percent just by asking or providing this kind of information as input to the system so you see this is something you really should make use of if you interact with that GPT weather systems and they are able here to have these jumps in performance because you give them a chance to show you this is my question this is the answer I expect and as a a rule of as a feeling I would say how many question answers you have to provide to really achieve a kind of an Optimum and in my case now for my text and for my ideas and for my answers it is about five pairs of Q and A that I write q1 A1 Q2 A2 so till I have five and then I have my main question some colleagues tell me I need eight okay so but between let's say between five and eight if you give five to eight examples those huge system with 540 billion parameters they are able to answer you in the way that you want it so do not stay with one example but just as a rough idea five to eight and then your question and you will be amazed by the quality of this answer so prompting a pump 540b just eight chain of Swords excellent Chief state of the art you have to a different mathematical Benchmark and recent so process even a fine tune GPD with a verify and whatever so you can do something if you have a clever input you have to know about this and you're not going to believe it now we have the next level of complexity what we do now we take this chain of Sword prompting and we combine it with acting and acting as simply as I showed you in in u.com chat and in perplexity I you integrate now online sources like Google search or Wikipedia or oh yeah bing bing will be the new you know what I mean so you have now this chain of Sword prompting with the ability that the system has now here this link to external information and what is this called react maybe you will hear people talking about react and this is not about something what you might think of but this is simply a Synergy of reasoning and acting reasoning and acting in large language models uh Google research yeah well what a coincidence and Princeton and they combine this and this is the research paper please have a look indeed about the react the reasoning and acting in large language model just gives you some ideas so we combine our hopeful abilities for reasoning as Chain of Thought prompting and acting and yes yes yes so knowledge base and I just told you everything the nice thing here is if we now look also on question and answer this is the easiest topic you also can verify just seeing the results you see here that you have significant better results the research paper gives you applying react outperforms imitation and reinforcement learning methods and reinforcement learning is a big part of chat GPT by an absolute success rate of 34 and 10 so these are really Champs that you achieve in the accuracy in the success rate of the system when you prompt a system in this react way and as I showed you it is rather easy to implement it you do not need any code any specific you just have to know how to do it so yes yes yes text text text uh let's have a look at example so here we go you have a question the question here is aside from Apple remote what other devices can control my Apple remote beautiful now if you have a GPS system you get a standard reply an iPod which is okay wrong beautiful if you then just apply this chain of sort and you have here this input and to this question and you get an answer that is has a multitude of answers element but is also wrong and then if you take from react only the Act only so you do a search and this is more or less what you.com would be or perplexity I would be when I showed you that they have access to Google search or to Microsoft Bing the search engine you see here you have an act one the search you have an observation of the agent that is now getting some additional information from its environment you have an Act 2 another search if this is not does not deliver the results you're looking for you have a second observation and then somewhere you stop and they give you an answer and the answer is wrong and now this the beauty is of course if you combine reasoning and acting so thinking and action together they give you here this example where I say sword one I need to search Apple remote and find a program it was originally designed for to interactive wave so you have an action you start action one you search or apple remote on a search engine you get some some paragraph back some some text back and then you have your thought number two apple remote yeah yeah so you search for a more detailed question you get the observation back hey I could not find this so say no problem if you did not find this we need to search for a new I don't know a term topic whatever and now this the act of search happens the observation comes back is discontinued and then you know exactly your sword number four and this finally is the sort that brings you to the finish and gives you the correct answer so you see you have this kind of reasoning action that you present to the system so that the system can learn from this think about it as a it's a a cookbook it's a recipe for an llm for Jetta jet GPT or whatever Auto regressive systems you work with this is the recipe you tell this system in the input prompt Hey listen this is my recipe do it I think this is the easiest way yeah there's decision making I would not go to decision making if you want to have an idea how the the the theory behind it is is simple they say that within the agent they augment the agent's action space where they give it now and another dimension or another space in itself which is the space of language so an action high is now part of the language space which we will refer to as a sort or a reasoning Trace but this does not affect the external environment so we have no observation feedback from this but we have extended our now think about a vector space so we have more or less more possibilities and they say as the language space is unlimited yes yes yes and we have a poem 540b is prompted with few shot in context examples to generate both domain specific action and free form language sorts to ask to for task solving and it's nice it's it's a little bit challenging yeah they have some nice results but I would say have a look at the data yourself just to give you an idea this is what I wanted to show you with the last information from this research paper here um if you have the learning now as a prompt or as a learning at the classical fine tune they show you here the scaling results for prompting and fine-tuning on a specific fine tuning always specific task in the task is question and answering if you use notice react scheme and they have your different models and you see here you have here in blue the standard model in Orange you have the chain of sword in green yeah activate it and then you have here in red that react way and as you can see here the model size this here for example is a 540 billion Palm model and you see independent of the methodology that you use here you are about I don't know 25 whatever uh that that percentage indicates on success rate or normalized success rate or whatever but you can now and look here you can achieve here with the real fine tuning of course I mean if you really have a huge data set that you can find in the old system on you achieve even with the 62 billion Palm model you achieve here let's have a look at the react the best result of those four methodologies but even higher than with the 540 billion model where you just have the learning methodology as in the prompting of course if you have real high quality data thousands and thousands and thousands of elements of new data where you can run a fine tune of course it is better than when you do just eight examples in your input prompt for a pre-trained llm but just to show you so we fine-tuning if you have the chance to really find you in a system if you have the data the human generator data wherever you collect your data fine tune is the way to go even with react you see even you can come down from a 540 billion system just to adjust 62 billion size system in Palm and you get even better results with fine tuning this was it for here I don't want to go in further details yeah there are different Benchmark have a look at this it is really interesting um yeah and now there's something else and this will be the last part of this talk now today I want to show you now that you have seen that with the structure of your input prompt the structure of the question that you do not just have a question but you provide information before you ask the system so that the system can learn on this little place when you ask something how how is this possible now I showed you that it is possible but what is the deeper explanation of it and there is a beautiful paper Google research MIT Stanford and they try to come up with an explanation and it's a mathematically a little bit challenging but really interesting paper please have a look at this and it is the question why is in context learning in Auto regressive systems like large language model possible what happens how can you simply by providing some input some demonstration in the in the question field and the system learns from this without being fine-tuned in the heavy old-fashioned way and the answer they come up with is that there's a system within the system that the the hidden layers have if you want a self-similar I will not call it a reasoning structure a self-similar connectome developed so that very simple tasks like regression can be performed although not being exactly fine-tuned for this task but just given five to six seven eight examples and this is for a theoretical point of view a fascinating question but this would now I think be a little bit too much for today have a look of the paper it's about 10 12 pages of mathematics if you like mathematics go for it I say thank you I hope you enjoyed it a little bit and I'll see you in my next video

Original Description

Prompt engineering was yesterday. New insights into in-context learning to achieve significant better results w/ all autoregressive LLMs (like ChatGPT, BioGPT or PaLM540B). Latest research on Chain-of-Thought Prompting (CoT) and ReAct, combining reasoning and action (agent receives external data). The practice of Fine-tuning BioBERT versus prompting GPT-3 vs prefix-tuning BioGPT on biomedical data (like PubMedQA). Why does in-context learning work w/ ChatGPT? Apply intelligent input prompts to a pre-trained LLM system (like ChatGPT) and avoid expensive domain-specific fine-tuning, since comparable results are achievable, when applying latest research results and insights. I recommend this research literature: ----------------------------------------------------------- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models https://arxiv.org/pdf/2201.11903v6.pdf REACT : SYNERGIZING REASONING AND ACTING IN LANGUAGE MODELS https://arxiv.org/pdf/2210.03629.pdf Ask Me Anything: A simple strategy for prompting language models (AMA) https://arxiv.org/pdf/2210.02441.pdf WHAT LEARNING ALGORITHM IS IN-CONTEXT LEARN- ING? INVESTIGATIONS WITH LINEAR MODELS https://arxiv.org/pdf/2211.15661.pdf LARGE LANGUAGE MODELS ARE HUMAN-LEVEL PROMPT ENGINEERS https://arxiv.org/pdf/2211.01910.pdf BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining https://arxiv.org/pdf/2210.10341.pdf #in-context_learning

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Discover AI · Discover AI · 33 of 60

← Previous Next →

Step Into the Unknown (by YouChat) - May 2023 be your best year yet

Step Into the Unknown (by YouChat) - May 2023 be your best year yet

Wishing you all an amazing 2023 filled with Love, Laughter, and Happiness!

Wishing you all an amazing 2023 filled with Love, Laughter, and Happiness!

Create a Smarter Future!

Create a Smarter Future!

The Art of Text to Vector Transformation: A Comprehensive Look at AI and NLP Transformers

The Art of Text to Vector Transformation: A Comprehensive Look at AI and NLP Transformers

Feature Vectors: The Key to Unlocking the Power of BERT and SBERT Transformer Models

Feature Vectors: The Key to Unlocking the Power of BERT and SBERT Transformer Models

Domain-Specific AI Models: How to Create Customized BERT and SBERT Models for Your Business

Domain-Specific AI Models: How to Create Customized BERT and SBERT Models for Your Business

Achieve Unimaginable Levels of Domain Knowledge through SBERT Extreme in 3D (SBERT 48)

Achieve Unimaginable Levels of Domain Knowledge through SBERT Extreme in 3D (SBERT 48)

Unlocking Scientific Domain Knowledge w/ BPE Tokenizer: An Amazing Journey! (SBERT 49)

Unlocking Scientific Domain Knowledge w/ BPE Tokenizer: An Amazing Journey! (SBERT 49)

SBERT Extreme 3D: Train a BERT Tokenizer on your (scientific) Domain Knowledge (SBERT 50)

SBERT Extreme 3D: Train a BERT Tokenizer on your (scientific) Domain Knowledge (SBERT 50)

Discover Vision Transformer (ViT) Tech in 2023

Discover Vision Transformer (ViT) Tech in 2023

Pre-Train BERT from scratch: Solution for Company Domain Knowledge Data | PyTorch (SBERT 51)

Pre-Train BERT from scratch: Solution for Company Domain Knowledge Data | PyTorch (SBERT 51)

Flan-T5-XL model on a free COLAB | A free LLM - that explains itself w/ reasoning /write essay | AI

Flan-T5-XL model on a free COLAB | A free LLM - that explains itself w/ reasoning /write essay | AI

BERT and GPT in Language Models like ChatGPT or BLOOM | EASY Tutorial on Large Language Models LLM

BERT and GPT in Language Models like ChatGPT or BLOOM | EASY Tutorial on Large Language Models LLM

Free Alternative to ChatGPT: Flan-T5-XL GUI (open-source) #shorts

Free Alternative to ChatGPT: Flan-T5-XL GUI (open-source) #shorts

From T5 to T5X: A Game-Changing Evolution with JAX & FLAX

From T5 to T5X: A Game-Changing Evolution with JAX & FLAX

How to start with ChatGPT? | Short Introduction to OpenAI API #shorts

How to start with ChatGPT? | Short Introduction to OpenAI API #shorts

The Future of Conversational AI? Google's PaLM w/ RLHF | LLM ChatGPT Competitor

The Future of Conversational AI? Google's PaLM w/ RLHF | LLM ChatGPT Competitor

Microsoft and ChatGPU

Microsoft and ChatGPU

From Zero to FLAN-T5 XL Model GUI with Gradio: A Step-by-Step Guide on Free COLAB Notebook PyTorch

From Zero to FLAN-T5 XL Model GUI with Gradio: A Step-by-Step Guide on Free COLAB Notebook PyTorch

Google's 2nd Answer to "BING ChatGPT": Sparrow | after BARD w/ LaMDA | 2nd Gen Conversational AI

Google's 2nd Answer to "BING ChatGPT": Sparrow | after BARD w/ LaMDA | 2nd Gen Conversational AI

TF2: Pre-Train BERT from scratch (a Transformer), fine-tune & run inference on text | KERAS NLP

TF2: Pre-Train BERT from scratch (a Transformer), fine-tune & run inference on text | KERAS NLP

3D Visualization for BERT: How to Pre-Train with a New Layer & Fine-Tune with Downstream Task Layer

3D Visualization for BERT: How to Pre-Train with a New Layer & Fine-Tune with Downstream Task Layer

FLAN-T5-XXL on NVIDIA A100 GPU w/ HF Inference Endpoints, let's explore 11b models!

FLAN-T5-XXL on NVIDIA A100 GPU w/ HF Inference Endpoints, let's explore 11b models!

ChatGPT - Can it Lie to you?

ChatGPT - Can it Lie to you?

ChatGPT Alternative: Perplexity by Perplexity.AI

ChatGPT Alternative: Perplexity by Perplexity.AI

2023 KerasNLP Tutorial: Explore Latest KERAS Toolbox & NLP Processing Library for BERT - TF2

2023 KerasNLP Tutorial: Explore Latest KERAS Toolbox & NLP Processing Library for BERT - TF2

Self-aware AI: You.com/chat vs Perplexity.ai | Live Demo, LLMs show Future of ChatGPT w/ BING

Self-aware AI: You.com/chat vs Perplexity.ai | Live Demo, LLMs show Future of ChatGPT w/ BING

BLOOM 176B Inference on AWS | Bigger than GPT-3 for more Power!

BLOOM 176B Inference on AWS | Bigger than GPT-3 for more Power!

Fine-tune ChatGPT? Buy Embeddings /OpenAI? What are Embeddings? My own ChatGPT? | Visual Q+A

Fine-tune ChatGPT? Buy Embeddings /OpenAI? What are Embeddings? My own ChatGPT? | Visual Q+A

Unleashing the Power of BLOOM 176B with AWS ml.p4de.24xlarge, DJL & DeepSpeed: The Ultimate Boost!

Unleashing the Power of BLOOM 176B with AWS ml.p4de.24xlarge, DJL & DeepSpeed: The Ultimate Boost!

After ChatGPT: NEW BioGPT by Microsoft | Do YOU trust Microsoft for your Medication?

After ChatGPT: NEW BioGPT by Microsoft | Do YOU trust Microsoft for your Medication?

Improve ChatGPT: Modular, Adaptive, Smart LLM | Inside ChatGPT

Improve ChatGPT: Modular, Adaptive, Smart LLM | Inside ChatGPT

Fine-tune ChatGPT w/ in-context learning ICL - Chain of Thought, AMA, reasoning & acting: ReAct

Fine-tune ChatGPT w/ in-context learning ICL - Chain of Thought, AMA, reasoning & acting: ReAct

The Intersection of Copyright Law and Human Faces: Exploring Virtual K-Pop with MAVE

The Intersection of Copyright Law and Human Faces: Exploring Virtual K-Pop with MAVE

New TECH: Vision Transformer 2023 on Image Classification | AI

New TECH: Vision Transformer 2023 on Image Classification | AI

PyTorch code Vision Transformer: Apply ViT models pre-trained and fine-tuned | AI Tech

PyTorch code Vision Transformer: Apply ViT models pre-trained and fine-tuned | AI Tech

New BING ChatGPT: Unlock the Power of Emotions in your Search Engine!

New BING ChatGPT: Unlock the Power of Emotions in your Search Engine!

New BING ChatGPT loses its mind

New BING ChatGPT loses its mind

Self-Attention Heads of last Layer of Vision Transformer (ViT) visualized (pre-trained with DINO)

Self-Attention Heads of last Layer of Vision Transformer (ViT) visualized (pre-trained with DINO)

Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI

Visualizing the Self-Attention Head of the Last Layer in DINO ViT: A Unique Perspective on Vision AI

Microsoft strongly restricts access to ChatGPT on new BING - WHY?

Microsoft strongly restricts access to ChatGPT on new BING - WHY?

PyTorch ViT: The Ultimate Guide to Fine-Tuning for Object Identification (COLAB)

PyTorch ViT: The Ultimate Guide to Fine-Tuning for Object Identification (COLAB)

New BING Chat AGGRESSIVE

New BING Chat AGGRESSIVE

Panoptic Image Segmentation: Mask2Former explained | Identify all objects!

Panoptic Image Segmentation: Mask2Former explained | Identify all objects!

Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial

Code Panoptic Image Segmentation w/ Vision Transformer & Mask2Former - A PyTorch tutorial

Dream Job Alert: AI Prompt Engineer - $335K | AI Prompt Design: A Crash Course

Dream Job Alert: AI Prompt Engineer - $335K | AI Prompt Design: A Crash Course

Streamlining Similar Image Detection with ViT in PyTorch: A Step-by-Step Guide

Streamlining Similar Image Detection with ViT in PyTorch: A Step-by-Step Guide

Microsoft's CEO in Trouble #shorts

Microsoft's CEO in Trouble #shorts

Why wait for KOSMOS-1? Code a VISION - LLM w/ ViT, Flan-T5 LLM and BLIP-2: Multimodal LLMs (MLLM)

Why wait for KOSMOS-1? Code a VISION - LLM w/ ViT, Flan-T5 LLM and BLIP-2: Multimodal LLMs (MLLM)

OpenAI's ChatGPT can NOW summarize external Sources on the Internet?

OpenAI's ChatGPT can NOW summarize external Sources on the Internet?

ChatGPT polarizes

ChatGPT polarizes

Hospital /Clinic AI Decision Models: Performance of 12 AI LLM Systems (incl $$) Radiology, Biomed

Hospital /Clinic AI Decision Models: Performance of 12 AI LLM Systems (incl $$) Radiology, Biomed

ChatGPT Prompt Engineering w/ in-context learning (ICL) - 7 Examples | Tutorial

ChatGPT Prompt Engineering w/ in-context learning (ICL) - 7 Examples | Tutorial

Chat with your Image! BLIP-2 connects Q-Former w/ VISION-LANGUAGE models (ViT & T5 LLM)

Chat with your Image! BLIP-2 connects Q-Former w/ VISION-LANGUAGE models (ViT & T5 LLM)

ChatGPT: Multidimensional Prompts

ChatGPT: Multidimensional Prompts

ChatGPT: In-context Retrieval-Augmented Learning (IC-RALM) | In-context Learning (ICL) Examples

ChatGPT: In-context Retrieval-Augmented Learning (IC-RALM) | In-context Learning (ICL) Examples

Code your BLIP-2 APP: VISION Transformer (ViT) + Chat LLM (Flan-T5) = MLLM

Code your BLIP-2 APP: VISION Transformer (ViT) + Chat LLM (Flan-T5) = MLLM

Buy Microsoft "Azure OpenAI Service" or buy from OpenAI its API for ChatGPT access & tuning?

Buy Microsoft "Azure OpenAI Service" or buy from OpenAI its API for ChatGPT access & tuning?

Pretraining vs Fine-tuning vs In-context Learning of LLM (GPT-x) EXPLAINED | Ultimate Guide ($)

Pretraining vs Fine-tuning vs In-context Learning of LLM (GPT-x) EXPLAINED | Ultimate Guide ($)

Reversible Transformer: ReFORMER for GPU Memory Optimization! Reversible Residual Layers?

Reversible Transformer: ReFORMER for GPU Memory Optimization! Reversible Residual Layers?

The video teaches how to fine-tune ChatGPT with in-context learning (ICL) using Chain of Thought, AMA, reasoning, and acting: ReAct, and explores prefix tuning as a lightweight alternative to fine-tuning for natural language generation. It provides practical steps and tools for achieving better results with LLMs. The key insight is that fine-tuning with in-context learning can achieve better results than pre-trained LLMs.

Key Takeaways

Reformulate input for in-context learning ICL
Provide 5-8 pairs of Q and A for optimal performance
Apply Chain of Thought prompting with acting and external information
Use prefix tuning to optimize task-specific prefixes
Fine-tune models with scaling results for prompting and fine-tuning

💡 Fine-tuning with in-context learning can achieve better results than pre-trained LLMs

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Building LSTMs with PyTorch and Lightning AI Part 7: Resuming Training with Checkpoints

Learn to resume LSTM training with checkpoints using PyTorch and Lightning AI, enabling efficient model iteration and development

Dev.to · Rijul Rajesh

How AI Learns with Less Labeled Data

Learn how AI can learn with less labeled data, a crucial aspect of machine learning beyond model selection

Comparing Sarvam-30B and Qwen2.5–14B on Spider Text-to-SQL: An Active-Parameter Perspective

Learn how to compare large language models like Sarvam-30B and Qwen2.5-14B on the Spider Text-to-SQL benchmark from an active-parameter perspective

Debugging Benchmark: DeepSeek V4 Pro vs MiMo V2.5 Pro

Compare the debugging capabilities of DeepSeek V4 Pro and MiMo V2.5 Pro on a real-world GitHub bug

Dev.to · Stanislav

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)