What is GPT-3 and how does it work? | A Quick Review

AssemblyAI · Beginner ·🧠 Large Language Models ·4y ago

Skills: LLM Foundations53%

Key Takeaways

This video teaches the basics of GPT-3 and how it works

Full Transcript

If you're into deep learning or generally artificial intelligence, it's very likely that you've heard of GPT-3. And even if you haven't heard of it, you might have used a tool that was built with GPT-3. But what is it and how does it work? Let's learn that in this video. GPT-3 is a language model that was developed by OpenAI. What's special about it is that it can perform really well on multiple NLP tasks and that it is very big. It is based on a very big transformer network. It is the biggest dense network there is, in fact, with 175 billion parameters. You might ask, what is a language model? Well, a language model is basically a probabilistic model that is able to guess what the next word should be in a sentence. What makes GPT-3 unique is the lack of fine-tuning. You might say, there are multiple models that you can make that will guess the next word in a sentence, but that's not the only thing that GPT-3 does. The network is only trained on the task of guessing what the next word should be, but in this manner, it learns how the language works and as a result, then it is able to perform many other NLP-related tasks. In this way, it works kind of similar to how a human learns. Once you know a language, you can translate it to another language, you can understand what the next word in a sentence should be, or you can fill in the blanks of a sentence. The dominant approach to NLP tasks before GPT-3 was to fine-tune certain models based on certain tasks. And in certain tasks, state-of-the-art fine-tuned models still work very well, even better than GPT-3, but GPT-3 performs much better when it comes to machine translation, filling in the blanks, and question answering. And this was a very big deal in the world of artificial intelligence because it basically proved that a very big unsupervised model can match or even surpass the performance of fine-tuned models. All right, so now let's look into how the GPT-3 works and how it learns. So, the architecture is very similar to the transformers architecture that we learned. If you don't remember it or if you haven't watched that video, go ahead and check the video we made on transformers to learn more about how they work and what their architecture is. The main difference between the transformers architecture and the GPT-3 architecture or the general GPT architecture OpenAI came up with is what part of the transformers it uses. If you remember, a transformers architecture has an encoder and a decoder, whereas for GPT-3, the only thing they use is decoder blocks. Another difference is that in the transformers architecture, in the decoders, we have a masked self-attention layer, another encoder-decoder attention layer, and a feedforward neural network. And in between, we have some layer normalizations. With GPT-3, however, inside the decoder blocks, we only have a masked self-attention layer and a feedforward neural network layer. So, basically, they got rid of the encoder-decoder self-attention layer. On top of this, different locations for the layer normalization was tried with the GPT architecture inside the decoder block. And with GPT-3, they also introduced alternating dense and sparse self-attention layers. To create GPT-3, these layers of decoders were trained on 300 billion tokens, tokens being either words or parts of words. And this data was collected from the internet and also books. A very strong model like this comes with some drawbacks, of course. Main ones being unwanted bias that was inside the data that you collected against minority groups. Another one is the environmental impact of training a model as big as GPT-3 and the potential abuse of the system by creating fake articles or fake news. But even though there are drawbacks with models like these, they also still benefit humanity. GPT-3 has already been used as base of tools and companies that help people in their everyday life. Some examples of these are creative writing, code generation, content creation, or customer service, for example. There will be a link in the description to a webpage with a comprehensive list of all the tools and companies that are built on GPT-3 and their different domains. GPT-3 is licensed by Microsoft to be used exclusively in the core of it, but if you want to use it to develop apps, you can now have access to it in a public way using the OpenAI's API, but of course, when you're creating the apps, you have to abide by their ground rules. Overall, this is a very exciting development for the world of AI and the world generally. If you want to learn more about the latest developments in AI and learn more about the techniques that are used in these technologies, don't forget to subscribe to our channel so you can be one of the first people to know when we release a new video. Don't forget to give us a like for this video if you liked it and leave a comment with your thoughts or your questions. We would be delighted to see them. But for now, have a nice day and I'll see you around.

Original Description

You probably have heard of GPT-3 and how it is a fascinating development. But have you learned why or how GPT-3 managed to impress so many people? In this video, we will learn why GPT-3 is so unique, and how it manages to help bring in a new wave of excitement for AI. On top of this, we will also briefly look under the hood of GPT-3 to understand its architecture and some of its potential dangers. Want to give AssemblyAI’s automatic speech-to-text transcription API a try? Get your free API token here 👇 https://www.assemblyai.com/?utm_source=youtube&utm_medium=referral&utm_campaign=yt_mis_12 Apps made with GPT-3: https://gpt3demo.com/ B-roll credits: Video by Julia M Cameron (https://www.pexels.com/@julia-m-cameron) from Pexels Video by Jack Sparrow (https://www.pexels.com/@jack-sparrow) from Pexels

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from AssemblyAI · AssemblyAI · 33 of 60

← Previous Next →

Python Speech Recognition in 5 Minutes

Python Speech Recognition in 5 Minutes

Python Click Part 1 of 4

Python Click Part 1 of 4

Python Click Part 2 of 4

Python Click Part 2 of 4

Python Click Part 3 of 4

Python Click Part 3 of 4

Python Click Part 4 of 4

Python Click Part 4 of 4

Deep learning in 5 minutes | What is deep learning?

Deep learning in 5 minutes | What is deep learning?

How to make a web app that transcribes YouTube videos with Streamlit | Part 1

How to make a web app that transcribes YouTube videos with Streamlit | Part 1

How to make a web app that transcribes YouTube videos with Streamlit | Part 2

How to make a web app that transcribes YouTube videos with Streamlit | Part 2

Batch normalization | What it is and how to implement it

Batch normalization | What it is and how to implement it

Real-time Speech Recognition in 15 minutes with AssemblyAI

Real-time Speech Recognition in 15 minutes with AssemblyAI

Regularization in a Neural Network | Dealing with overfitting

Regularization in a Neural Network | Dealing with overfitting

Add speech recognition to your Streamlit apps in 5 minutes

Add speech recognition to your Streamlit apps in 5 minutes

Transformers for beginners | What are they and how do they work

Transformers for beginners | What are they and how do they work

Automatic Chapter Detection With AssemblyAI | Python Tutorial

Automatic Chapter Detection With AssemblyAI | Python Tutorial

Deep Learning Series Part 1 - What is Deep Learning?

Deep Learning Series Part 1 - What is Deep Learning?

Deep Learning Series part 2 - Why is it called “Deep Learning”?

Deep Learning Series part 2 - Why is it called “Deep Learning”?

Activation Functions In Neural Networks Explained | Deep Learning Tutorial

Activation Functions In Neural Networks Explained | Deep Learning Tutorial

Deep Learning Series part 3 - Deep Learning vs. Machine Learning

Deep Learning Series part 3 - Deep Learning vs. Machine Learning

Deep Learning Series part 4 - Why is Deep Learning better for NLP?

Deep Learning Series part 4 - Why is Deep Learning better for NLP?

Intro to Batch Normalization Part 1

Intro to Batch Normalization Part 1

Intro to Batch Normalization Part 2

Intro to Batch Normalization Part 2

Intro to Batch Normalization Part 3 - What is Normalization?

Intro to Batch Normalization Part 3 - What is Normalization?

Intro to Batch Normalization Part 4

Intro to Batch Normalization Part 4

Intro to Batch Normalization Part 5

Intro to Batch Normalization Part 5

Sentiment Analysis for Earnings Calls with AssemblyAI

Sentiment Analysis for Earnings Calls with AssemblyAI

Summarizing my favorite podcasts with Python

Summarizing my favorite podcasts with Python

Introduction to Regularization

Introduction to Regularization

How/Why Regularization in Neural Networks?

How/Why Regularization in Neural Networks?

Getting Started With Torchaudio | PyTorch Tutorial

Getting Started With Torchaudio | PyTorch Tutorial

Types of Regularization

Types of Regularization

Tuning Alpha in L1 and L2 Regularization

Tuning Alpha in L1 and L2 Regularization

Dropout Regularization

Dropout Regularization

What is GPT-3 and how does it work? | A Quick Review

What is GPT-3 and how does it work? | A Quick Review

Backpropagation For Neural Networks Explained | Deep Learning Tutorial

Backpropagation For Neural Networks Explained | Deep Learning Tutorial

Jupyter Notebooks Tutorial | How to use them & tips and tricks!

Jupyter Notebooks Tutorial | How to use them & tips and tricks!

Best Free Speech-To-Text APIs and Open Source Libraries

Best Free Speech-To-Text APIs and Open Source Libraries

Regularization - Early stopping

Regularization - Early stopping

Regularization - Data Augmentation

Regularization - Data Augmentation

Bias and Variance for Machine Learning | Deep Learning

Bias and Variance for Machine Learning | Deep Learning

Recurrent Neural Networks (RNNs) Explained - Deep Learning

Recurrent Neural Networks (RNNs) Explained - Deep Learning

What is BERT and how does it work? | A Quick Review

What is BERT and how does it work? | A Quick Review

Introduction to Transformers

Introduction to Transformers

Transformers | What is attention?

Transformers | What is attention?

Transformers | how attention relates to Transformers

Transformers | how attention relates to Transformers

Transformers | Basics of Transformers

Transformers | Basics of Transformers

Supervised Machine Learning Explained For Beginners

Supervised Machine Learning Explained For Beginners

Transformers | Basics of Transformers Encoders

Transformers | Basics of Transformers Encoders

Transformers | Basics of Transformers I/O

Transformers | Basics of Transformers I/O

How to evaluate ML models | Evaluation metrics for machine learning

How to evaluate ML models | Evaluation metrics for machine learning

Unsupervised Machine Learning Explained For Beginners

Unsupervised Machine Learning Explained For Beginners

Weight Initialization for Deep Feedforward Neural Networks

Weight Initialization for Deep Feedforward Neural Networks

Q-Learning Explained - Reinforcement Learning Tutorial

Q-Learning Explained - Reinforcement Learning Tutorial

Should You Use PyTorch or TensorFlow in 2022?

Should You Use PyTorch or TensorFlow in 2022?

What is Layer Normalization? | Deep Learning Fundamentals

What is Layer Normalization? | Deep Learning Fundamentals

I created a Python App to study FASTER

I created a Python App to study FASTER

How to create your FIRST NEURAL NETWORK with TensorFlow!

How to create your FIRST NEURAL NETWORK with TensorFlow!

Neural Networks Summary: All hyperparameters

Neural Networks Summary: All hyperparameters

Getting Started with OpenAI API and GPT-3 | Beginner Python Tutorial

Getting Started with OpenAI API and GPT-3 | Beginner Python Tutorial

Convert Speech-To-Text In Python in 60 seconds!

Convert Speech-To-Text In Python in 60 seconds!

Gradient Clipping for Neural Networks | Deep Learning Fundamentals

Gradient Clipping for Neural Networks | Deep Learning Fundamentals

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related Reads

IA local vs ChatGPT para empresas: qué usar y cuándo

Learn when to use local AI vs ChatGPT for your business and make an informed decision

MyClaw AI Isn’t Another Chatbot — It’s an AI Employee That Actually Gets Work Done

Learn how MyClaw AI is revolutionizing work productivity by acting as an AI employee that gets work done, unlike traditional chatbots

Why does AI love the em dash (—)??

Discover why AI models like ChatGPT overuse the em dash and how it affects writing style

Reddit r/artificial

RAG in Practice: From Text Search to Vector Databases

Learn how to apply RAG (Retrieval-Augmented Generation) in practice, moving from text search to vector databases, and improve your LLM skills

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)