GPT - Explained!

CodeEmporium · Advanced ·📐 ML Fundamentals ·3y ago

Skills: LLM Foundations90%Fine-tuning LLMs90%Prompt Craft80%ML Maths Basics80%Multimodal LLMs70%

Key Takeaways

The video explains the fundamentals of GPT, GPT-2, GPT-3, and ChatGPT, covering topics such as transfer learning, fine-tuning, and meta-learning, with a focus on language modeling and self-supervised learning. It highlights the differences between GPT models, including zero-shot learning, one-shot learning, and few-shot learning, and discusses the advantages and disadvantages of fine-tuning and meta-learning.

Full Transcript

hello everyone welcome to another episode of Code Emporium where we're going to talk about GPT so I've structured this video as a flow from Transformer neural networks to gpt3 and then eventually chat GPT I'm hoping it'll help grasp the overall landscape of language modeling by doing this so let's get to it and for more videos like this consider subscribing Transformers are sequence 2 sequence architectures they convert one sequence to another sequences have a defined ordering sentences for example are a sequence of words and so these Transformers can also be used to solve natural language problems such as text translation to train these architectures however we need a ton of labeled data on that specific task this would be difficult for Transformers or any other model to learn so how would we make it easier for models to learn with less data think about it [Music] correct the answer is transfer learning what a smarty so let's combine the Transformer neural network with transfer learning Transformers have two parts an encoder and a decoder each of them is able to learn a good representation of language so good that we can create language models from each part you stack the encoders to get a bi-directional encoder representation of Transformers that's Bert and you stack the decoder units and we can get generative pre-trained Transformers or GPT each of these architectures have created their own lines of research in this video I'll be focusing on GPT but for more information on Bert I have other videos that you can check out now before we nose dive into GPT let's talk about transfer learning training a model from scratch requires a lot of data because the parameters were randomly initialized but what if the parameters just happened to be initialized to values that are close to the values that we need well in this case we don't really need too much data to get to where we need to so here's a situation we have some model that has randomly initialized parameters it's then trained on some first task then these parameter values would have been updated because of that training now this model has some sort of knowledge so to speak and we can use this base knowledge to further train with data from another task and this is akin to transferring Knowledge from one task to another task and hence the name transfer learning this is the exact idea GPT and Bert use in this context the gbt training is thus divided into two parts we have pre-training where we train the GPT architecture to understand what language is and then fine tuning where we use transfer learning to further train the GPT architecture to perform well on specific language tasks let's talk a little bit about each so GPT is pre-trained on the task of language modeling this is essentially a task where the model is given random sentence parts and is made to predict the word that will come next why language modeling this is chosen to act as a good base for understanding the fun fundamentals of language and can be easily fine-tuned language modeling is often referred to as a self-supervised task as the sentences themselves form the input and the output labels in some papers you might see this as unsupervised learning the GPT fine-tuning task depends on what task we want to perform this could be text translation question answering or text summarization among many others these are typically supervised tasks that we would provide training data for with inputs and labels this approach works because we end up with a good model that requires less data than we would originally need had we train the model from scratch however there are some issues with this fine tuning approach still too much data is required for every single task we want to accomplish in NLP we still need to collect a data set of hundreds of thousands of examples each this limits what we can do with language models another issue is on overfitting now these models are huge the pre-training data set is Broad but the fine tuning data set is narrow and this may lead to parameter changes that can harm performance we would need to make sure the distribution of our fine-tuned data set is a good representation of what we see in the wild now another issue is logically humans learn from just a few examples whereas fine-tuning requires thousands to hundreds of thousands of examples broadly the direction that we want to take the fields of deep learning and natural language processing is along the lines of human intelligence humans really learn with just a few examples and not a hundred thousand to actually be good at a task and if we do build some system we want it to be able to context switch very fluidly for example we want them to interleave between actually talking text and then Computing some small map operations in between because language sometimes just works in that way and we might need to make calculations off the fly while we are talking mid-sentence now one potential solution to address these concerns is meta learning this approach was introduced in the next version of gbt that is gbt 2. gpt2 is similar to the original GPT model in the sense that it still has the same pre-training phase with language modeling but instead of the fine-tuning approach we would use something called zero shot learning zero shot learning entails that we don't really make any parameter updates once the model has been pre-trained instead when we want to make an inference during inference time we'd pass in the input as we would usually do but also pass in a prompt that says what instruction should be done with the input the issue with this approach is that zero shot learning is very hard for the model so we need to scale the architecture up to capture as many patterns in the language as we possibly can during pre-training gpt2 was trained with 1.5 billion parameters for this reason the approach though did not perform as well on fine-tuning for a number of benchmarks however scaling the architecture did indeed still help performance in some way continuing this line of thought what would happen if we use the same strategy of meta learning but we scale the architecture even more and this is what led to the third generation of gbt models gpt3 is the large language model trained with 175 billion parameters like its former GPT and gpt2 predecessors it was pre-trained with the language model objective and then it was fine-tuned with the meta learning objective but instead of just zero shot learning as we would have done in just gpt2 it could be one of the meta learning techniques such as zero shot learning One-Shot learning and even few shot learning so let's talk about each zero shot learning as we had mentioned before is where we just feed a prompt along with our input with this there is less of a chance of strange correlations compared to fine-tuning and also our model would be more robust the disadvantage though is that it's really difficult for even humans to start without a single example so this strategy is considered unfairly hard for the model then we have One-Shot learning along with what we feed for zero shot learning we also feed an example of what we want all of this is pushed as a vector to what we call a model context window and then we have few shot learning exactly like One-Shot learning but instead of just one complete example we feed multiple examples this could typically range from like 10 to 100 examples or whatever fits in the model's context window overall gpt3 has been pretty good even sometimes outperforming it's fine-tune counterparts on certain tasks now in conclusion I just want to say that fine-tuning and metal learning have their own advantages and disadvantages meta learning has not clearly supplanted fine-tuning after all in the first version of chat gbt released in December 2022 charge EBT actually has a fine-tuned GPT model at its core which shows some promise still in that direction and overall there's always something that's evolving in the field so it's exciting to follow along now that's where I'm going to end the video and thanks so much for watching check the description for some fun resources and videos that probably have my face on it and I will see you all in the next one bye

Original Description

Let's talk about GPT, GPT-2, GPT-3 and ChatGPT in 10 minutes ABOUT ME ⭕ Subscribe: https://www.youtube.com/c/CodeEmporium?sub_confirmation=1 📚 Medium Blog: https://medium.com/@dataemporium 💻 Github: https://github.com/ajhalthor 👔 LinkedIn: https://www.linkedin.com/in/ajay-halthor-477974bb/ RESOURCES [ 1🔎] GPT-3 Main Paper: https://arxiv.org/pdf/2005.14165.pdf [2 🔎] GPT-2 Main Paper: https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf [3 🔎] GPT original paper: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf [4 🔎] A very Nice intuitive understanding of GPT-3 architecture: https://dugas.ch/artificial_curiosity/GPT_architecture.html PLAYLISTS FROM MY CHANNEL ⭕ ChatGPT Playlist of all other videos: https://youtube.com/playlist?list=PLTl9hO2Oobd9coYT6XsTraTBo4pL1j4HJ ⭕ Transformer Neural Networks: https://youtube.com/playlist?list=PLTl9hO2Oobd_bzXUpzKMKA3liq2kj6LfE ⭕ Convolutional Neural Networks: https://youtube.com/playlist?list=PLTl9hO2Oobd9U0XHz62Lw6EgIMkQpfz74 ⭕ The Math You Should Know : https://youtube.com/playlist?list=PLTl9hO2Oobd-_5sGLnbgE8Poer1Xjzz4h ⭕ Probability Theory for Machine Learning: https://youtube.com/playlist?list=PLTl9hO2Oobd9bPcq0fj91Jgk_-h1H_W3V ⭕ Coding Machine Learning: https://youtube.com/playlist?list=PLTl9hO2Oobd82vcsOnvCNzxrZOlrz3RiD MATH COURSES (7 day free trial) 📕 Mathematics for Machine Learning: https://imp.i384100.net/MathML 📕 Calculus: https://imp.i384100.net/Calculus 📕 Statistics for Data Science: https://imp.i384100.net/AdvancedStatistics 📕 Bayesian Statistics: https://imp.i384100.net/BayesianStatistics 📕 Linear Algebra: https://imp.i384100.net/LinearAlgebra 📕 Probability: https://imp.i384100.net/Probability OTHER RELATED COURSES (7 day free trial) 📕 ⭐ Deep Learning Specialization: https://imp.i384100.net/Deep-Learning 📕 Python for Everybody: https://imp.i384100.net/python ��

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from CodeEmporium · CodeEmporium · 0 of 60

← Previous Next →

Linear Regression and Multiple Regression

Linear Regression and Multiple Regression

Logistic Regression - THE MATH YOU SHOULD KNOW!

Logistic Regression - THE MATH YOU SHOULD KNOW!

Generative Adversarial Networks - FUTURISTIC & FUN AI !

Generative Adversarial Networks - FUTURISTIC & FUN AI !

Deep Learning on the Cloud - GPU TO LEARN FASTER

Deep Learning on the Cloud - GPU TO LEARN FASTER

Deep Mind's AlphaGo Zero - EXPLAINED

Deep Mind's AlphaGo Zero - EXPLAINED

Mask Region based Convolution Neural Networks - EXPLAINED!

Mask Region based Convolution Neural Networks - EXPLAINED!

Attention in Neural Networks

Attention in Neural Networks

Depthwise Separable Convolution - A FASTER CONVOLUTION!

Depthwise Separable Convolution - A FASTER CONVOLUTION!

One Neural network learns EVERYTHING ?!

One Neural network learns EVERYTHING ?!

Neural Voice Cloning

Neural Voice Cloning

AI creates Image Classifiers…by DRAWING?

AI creates Image Classifiers…by DRAWING?

Unpaired Image-Image Translation using CycleGANs

Unpaired Image-Image Translation using CycleGANs

K-Means Clustering - EXPLAINED!

K-Means Clustering - EXPLAINED!

Random Forest Classification

Random Forest Classification

Data Science in Finance

Data Science in Finance

Hypothesis testing with Applications in Data Science

Hypothesis testing with Applications in Data Science

A/B Testing - Simply Explained

A/B Testing - Simply Explained

The Kernel Trick - THE MATH YOU SHOULD KNOW!

The Kernel Trick - THE MATH YOU SHOULD KNOW!

Support Vector Machines - THE MATH YOU SHOULD KNOW

Support Vector Machines - THE MATH YOU SHOULD KNOW

Principal Component Analysis (PCA) - THE MATH YOU SHOULD KNOW!

Principal Component Analysis (PCA) - THE MATH YOU SHOULD KNOW!

History of Calculus - Animated

History of Calculus - Animated

Curiosity in AI

Curiosity in AI

DropBlock - A BETTER DROPOUT for Neural Networks

DropBlock - A BETTER DROPOUT for Neural Networks

Autoencoders - EXPLAINED

Autoencoders - EXPLAINED

Recurrent Neural Networks - EXPLAINED!

Recurrent Neural Networks - EXPLAINED!

LSTM Networks - EXPLAINED!

LSTM Networks - EXPLAINED!

Building an Image Captioner with Neural Networks

Building an Image Captioner with Neural Networks

10 Machine Learning Questions - ANSWERED!

10 Machine Learning Questions - ANSWERED!

How do neural networks work?

How do neural networks work?

Evolution of Face Generation | Evolution of GANs

Evolution of Face Generation | Evolution of GANs

How does Google Translate's AI work?

How does Google Translate's AI work?

How to keep up with AI research?

How to keep up with AI research?

How does YouTube recommend videos? - AI EXPLAINED!

How does YouTube recommend videos? - AI EXPLAINED!

Variational Autoencoders - EXPLAINED!

Variational Autoencoders - EXPLAINED!

Logistic Regression - VISUALIZED!

Logistic Regression - VISUALIZED!

Gradient Descent - THE MATH YOU SHOULD KNOW

Gradient Descent - THE MATH YOU SHOULD KNOW

Boosting - EXPLAINED!

Boosting - EXPLAINED!

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

Loss Functions - EXPLAINED!

Loss Functions - EXPLAINED!

Optimizers - EXPLAINED!

Optimizers - EXPLAINED!

NLP with Neural Networks & Transformers

NLP with Neural Networks & Transformers

Batch Normalization - EXPLAINED!

Batch Normalization - EXPLAINED!

Activation Functions - EXPLAINED!

Activation Functions - EXPLAINED!

Data Scientist Answers Interview Questions

Data Scientist Answers Interview Questions

Why use GPU with Neural Networks?

Why use GPU with Neural Networks?

How do GPUs speed up Neural Network training?

How do GPUs speed up Neural Network training?

BERT Neural Network - EXPLAINED!

BERT Neural Network - EXPLAINED!

ConvNets Scaled Efficiently

ConvNets Scaled Efficiently

Transformer Neural Net makes music! (JukeboxAI)

Transformer Neural Net makes music! (JukeboxAI)

What do filters of Convolution Neural Network learn?

What do filters of Convolution Neural Network learn?

We're hosting a Machine Learning Conference!

We're hosting a Machine Learning Conference!

MLconfEU 2020: Machine Learning Conference for Software Engineers

MLconfEU 2020: Machine Learning Conference for Software Engineers

Are Neural Networks Intelligent?

Are Neural Networks Intelligent?

Time Series Forecasting with Machine Learning

Time Series Forecasting with Machine Learning

Few Shot Learning - EXPLAINED!

Few Shot Learning - EXPLAINED!

How does a Data Scientist Fight FRAUD?

How does a Data Scientist Fight FRAUD?

How would a Data Scientist analyze Customer Churn?

How would a Data Scientist analyze Customer Churn?

Expectations with Machine Learning

Expectations with Machine Learning

Why Logistic Regression DOESN'T return probabilities?!

Why Logistic Regression DOESN'T return probabilities?!

How you SHOULD code Machine Learning

How you SHOULD code Machine Learning

This video provides an overview of GPT models, including GPT, GPT-2, GPT-3, and ChatGPT, and explains the concepts of transfer learning, fine-tuning, and meta-learning. It discusses the advantages and disadvantages of different learning approaches and highlights the importance of language modeling and self-supervised learning. By watching this video, viewers can gain a deeper understanding of GPT models and their applications.

Key Takeaways

Understand the basics of GPT models
Learn about transfer learning and fine-tuning
Explore meta-learning and its applications
Compare the advantages and disadvantages of fine-tuning and meta-learning
Apply knowledge of GPT models to real-world tasks

💡 GPT models have revolutionized the field of natural language processing, and understanding the differences between GPT, GPT-2, GPT-3, and ChatGPT is crucial for applying these models to real-world tasks.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Stop Overfitting With Basically One Line of Code

Learn to prevent overfitting with a simple code tweak and understand the difference between Ridge and Lasso regression

Stop Overfitting With Basically One Line of Code

Learn to prevent overfitting in machine learning models with a simple code tweak and understand the difference between Ridge and Lasso regression

Medium · Machine Learning

Why Your Python Loops Are Creating the Wrong Functions

Learn why Python loops create functions with the same value and how to fix it using default argument capture and factory functions

Answer Calculator: Step-by-Step Math Help

Learn to use an Answer Calculator for step-by-step math help, making it a valuable tool for late-night studying or work

Learn Deep Learning by Hand (Beginner's Guide - Part 1)