What is GPT-3 and how does it work? | A Quick Review
Skills:
LLM Foundations53%
Key Takeaways
This video teaches the basics of GPT-3 and how it works
Full Transcript
If you're into deep learning or generally artificial intelligence, it's very likely that you've heard of GPT-3. And even if you haven't heard of it, you might have used a tool that was built with GPT-3. But what is it and how does it work? Let's learn that in this video. GPT-3 is a language model that was developed by OpenAI. What's special about it is that it can perform really well on multiple NLP tasks and that it is very big. It is based on a very big transformer network. It is the biggest dense network there is, in fact, with 175 billion parameters. You might ask, what is a language model? Well, a language model is basically a probabilistic model that is able to guess what the next word should be in a sentence. What makes GPT-3 unique is the lack of fine-tuning. You might say, there are multiple models that you can make that will guess the next word in a sentence, but that's not the only thing that GPT-3 does. The network is only trained on the task of guessing what the next word should be, but in this manner, it learns how the language works and as a result, then it is able to perform many other NLP-related tasks. In this way, it works kind of similar to how a human learns. Once you know a language, you can translate it to another language, you can understand what the next word in a sentence should be, or you can fill in the blanks of a sentence. The dominant approach to NLP tasks before GPT-3 was to fine-tune certain models based on certain tasks. And in certain tasks, state-of-the-art fine-tuned models still work very well, even better than GPT-3, but GPT-3 performs much better when it comes to machine translation, filling in the blanks, and question answering. And this was a very big deal in the world of artificial intelligence because it basically proved that a very big unsupervised model can match or even surpass the performance of fine-tuned models. All right, so now let's look into how the GPT-3 works and how it learns. So, the architecture is very similar to the transformers architecture that we learned. If you don't remember it or if you haven't watched that video, go ahead and check the video we made on transformers to learn more about how they work and what their architecture is. The main difference between the transformers architecture and the GPT-3 architecture or the general GPT architecture OpenAI came up with is what part of the transformers it uses. If you remember, a transformers architecture has an encoder and a decoder, whereas for GPT-3, the only thing they use is decoder blocks. Another difference is that in the transformers architecture, in the decoders, we have a masked self-attention layer, another encoder-decoder attention layer, and a feedforward neural network. And in between, we have some layer normalizations. With GPT-3, however, inside the decoder blocks, we only have a masked self-attention layer and a feedforward neural network layer. So, basically, they got rid of the encoder-decoder self-attention layer. On top of this, different locations for the layer normalization was tried with the GPT architecture inside the decoder block. And with GPT-3, they also introduced alternating dense and sparse self-attention layers. To create GPT-3, these layers of decoders were trained on 300 billion tokens, tokens being either words or parts of words. And this data was collected from the internet and also books. A very strong model like this comes with some drawbacks, of course. Main ones being unwanted bias that was inside the data that you collected against minority groups. Another one is the environmental impact of training a model as big as GPT-3 and the potential abuse of the system by creating fake articles or fake news. But even though there are drawbacks with models like these, they also still benefit humanity. GPT-3 has already been used as base of tools and companies that help people in their everyday life. Some examples of these are creative writing, code generation, content creation, or customer service, for example. There will be a link in the description to a webpage with a comprehensive list of all the tools and companies that are built on GPT-3 and their different domains. GPT-3 is licensed by Microsoft to be used exclusively in the core of it, but if you want to use it to develop apps, you can now have access to it in a public way using the OpenAI's API, but of course, when you're creating the apps, you have to abide by their ground rules. Overall, this is a very exciting development for the world of AI and the world generally. If you want to learn more about the latest developments in AI and learn more about the techniques that are used in these technologies, don't forget to subscribe to our channel so you can be one of the first people to know when we release a new video. Don't forget to give us a like for this video if you liked it and leave a comment with your thoughts or your questions. We would be delighted to see them. But for now, have a nice day and I'll see you around.
Original Description
You probably have heard of GPT-3 and how it is a fascinating development. But have you learned why or how GPT-3 managed to impress so many people?
In this video, we will learn why GPT-3 is so unique, and how it manages to help bring in a new wave of excitement for AI. On top of this, we will also briefly look under the hood of GPT-3 to understand its architecture and some of its potential dangers.
Want to give AssemblyAI’s automatic speech-to-text transcription API a try? Get your free API token here 👇
https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_mis_12
Apps made with GPT-3: https://gpt3demo.com/
B-roll credits:
Video by Julia M Cameron (https://www.pexels.com/@julia-m-cameron) from Pexels
Video by Jack Sparrow (https://www.pexels.com/@jack-sparrow) from Pexels
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from AssemblyAI · AssemblyAI · 33 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
▶
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Python Speech Recognition in 5 Minutes
AssemblyAI
Python Click Part 1 of 4
AssemblyAI
Python Click Part 2 of 4
AssemblyAI
Python Click Part 3 of 4
AssemblyAI
Python Click Part 4 of 4
AssemblyAI
Deep learning in 5 minutes | What is deep learning?
AssemblyAI
How to make a web app that transcribes YouTube videos with Streamlit | Part 1
AssemblyAI
How to make a web app that transcribes YouTube videos with Streamlit | Part 2
AssemblyAI
Batch normalization | What it is and how to implement it
AssemblyAI
Real-time Speech Recognition in 15 minutes with AssemblyAI
AssemblyAI
Regularization in a Neural Network | Dealing with overfitting
AssemblyAI
Add speech recognition to your Streamlit apps in 5 minutes
AssemblyAI
Transformers for beginners | What are they and how do they work
AssemblyAI
Automatic Chapter Detection With AssemblyAI | Python Tutorial
AssemblyAI
Deep Learning Series Part 1 - What is Deep Learning?
AssemblyAI
Deep Learning Series part 2 - Why is it called “Deep Learning”?
AssemblyAI
Activation Functions In Neural Networks Explained | Deep Learning Tutorial
AssemblyAI
Deep Learning Series part 3 - Deep Learning vs. Machine Learning
AssemblyAI
Deep Learning Series part 4 - Why is Deep Learning better for NLP?
AssemblyAI
Intro to Batch Normalization Part 1
AssemblyAI
Intro to Batch Normalization Part 2
AssemblyAI
Intro to Batch Normalization Part 3 - What is Normalization?
AssemblyAI
Intro to Batch Normalization Part 4
AssemblyAI
Intro to Batch Normalization Part 5
AssemblyAI
Sentiment Analysis for Earnings Calls with AssemblyAI
AssemblyAI
Summarizing my favorite podcasts with Python
AssemblyAI
Introduction to Regularization
AssemblyAI
How/Why Regularization in Neural Networks?
AssemblyAI
Getting Started With Torchaudio | PyTorch Tutorial
AssemblyAI
Types of Regularization
AssemblyAI
Tuning Alpha in L1 and L2 Regularization
AssemblyAI
Dropout Regularization
AssemblyAI
What is GPT-3 and how does it work? | A Quick Review
AssemblyAI
Backpropagation For Neural Networks Explained | Deep Learning Tutorial
AssemblyAI
Jupyter Notebooks Tutorial | How to use them & tips and tricks!
AssemblyAI
Best Free Speech-To-Text APIs and Open Source Libraries
AssemblyAI
Regularization - Early stopping
AssemblyAI
Regularization - Data Augmentation
AssemblyAI
Bias and Variance for Machine Learning | Deep Learning
AssemblyAI
Recurrent Neural Networks (RNNs) Explained - Deep Learning
AssemblyAI
What is BERT and how does it work? | A Quick Review
AssemblyAI
Introduction to Transformers
AssemblyAI
Transformers | What is attention?
AssemblyAI
Transformers | how attention relates to Transformers
AssemblyAI
Transformers | Basics of Transformers
AssemblyAI
Supervised Machine Learning Explained For Beginners
AssemblyAI
Transformers | Basics of Transformers Encoders
AssemblyAI
Transformers | Basics of Transformers I/O
AssemblyAI
How to evaluate ML models | Evaluation metrics for machine learning
AssemblyAI
Unsupervised Machine Learning Explained For Beginners
AssemblyAI
Weight Initialization for Deep Feedforward Neural Networks
AssemblyAI
Q-Learning Explained - Reinforcement Learning Tutorial
AssemblyAI
Should You Use PyTorch or TensorFlow in 2022?
AssemblyAI
What is Layer Normalization? | Deep Learning Fundamentals
AssemblyAI
I created a Python App to study FASTER
AssemblyAI
How to create your FIRST NEURAL NETWORK with TensorFlow!
AssemblyAI
Neural Networks Summary: All hyperparameters
AssemblyAI
Getting Started with OpenAI API and GPT-3 | Beginner Python Tutorial
AssemblyAI
Convert Speech-To-Text In Python in 60 seconds!
AssemblyAI
Gradient Clipping for Neural Networks | Deep Learning Fundamentals
AssemblyAI
More on: LLM Foundations
View skill →Related Reads
🎓
Tutor Explanation
DeepCamp AI