What is BERT? | Deep Learning Tutorial 46 (Tensorflow, Keras & Python)

codebasics · Beginner ·🧬 Deep Learning ·4y ago

Skills: LLM Foundations90%

Key Takeaways

The video explains the basics of BERT, a popular language model by Google, and how it is used to solve NLP tasks, with demonstrations using Tensorflow, Keras, and Python. It covers the concepts of word embeddings, contextualized word embeddings, and transformer architecture.

Full Transcript

to build a career in natural language processing domain you need to have a knowledge of bird which is a very popular language model by google in this video i'm going to explain you in a very simple language as if if you're a high school student you can also understand it easily what is the point behind bird how bird is used in nlp tasks we will not go into details of transformer architecture etc but we'll have an overview of how bird works basically the bird model the usage and we'll also write some code in tensorflow and will generate some sentence and word embeddings using birds so let's get started let's assume you are working on a text classification task where the input to the model is a word and you want to classify that as either a person or country the input word here is dhoni who is a an indian cricket team captain and he's a person so that's why you would classify the word dhoni as a person now the input is not an image i am showing you image just as a reference but the input is only a word if the input is australia you would of course classify it as a country modula another bangladesh cricket player you will classify as a person now think about how this model would process the input world if it has seen muhammadullah or dhoni words before it can classify them as a person but let's say the input word is comments now the word comments how does the model interpret this word and can classify as a person it is little bit challenging you know you might be a little confused on how the model would do it so the essence here is how can we capture similarity between two words similarity as in comments is a person and a clicket player at the same time muhammadullah is also a person and a cricket player how can you say mahmudullah and comments are similar and let's say australia which is a country and comments are not very similar all right let's think about if you have two homes how do you say they are similar you look at the features of this home the features are bedroom area bathroom these two homes you can say yeah they're kind of similar but when you have a third home which is a bigger one you know 10 bedrooms 7 500 square foot a pretty rich person would own this kind of home you can say the second home and third home are not similar so if for an object which is home here if you can derive the features then by comparing those features you can say if those two those two objects are similar or not similarly think about how you can translate these words dhoni or scale etc into features the feature could be this okay the one is a person healthy and fit the values are between one zero to one one means like point nine means like really healthy if it is point one you know sick a person who cannot run even one mile and if you compare these individual features you can say dhonian comments are kind of similar but australia is not because c australia location value is 1 here location value is 0 person value is 0 percent value is 1. so if you take all these numbers and create vectors out of it and compare those two vectors you can say that dhoni is more similar to comments and comments in australia are not similar if you are doing a if you're building a model on let's say some cricket vocabulary you might have words such as essays bad comments etc and you can generate feature vector for each of these words these vectors are called word embeddings and we have covered that in previous videos so i recommend you watch those videos but the essence here is that when you compare the feature vector or word embedding of kaminson dhoni you will find that these two are kind of similar whereas australia and zimbabwe are kind of similar so those two are countries those two are people so this is a very powerful concept and one of the ways you can generate the word embedding from the word is by using word to whack so go to youtube search for code basics word to work or code basics word embedding watch those videos you will get a good understanding here we took these features individually but in the real life the models can figure out these features on its own it almost sounds like magical how they can do it but it is possible so for that you have to watch those videos but the issue with word to wack is this carefully read these two sentences the meaning of fair is very different in these two sentences and the first one fair means you know unbiased like a equal treatment in the second one fair means really carnival or you know like fun activity but word to wack will on generate fix embedding vector now if you use a fixed embedding vector in both the context then it's not right because really the meaning of fair in both the sentences is different so you need a model which can generate contextualized meaning of a word meaning you can look at the whole sentence and based on that you generate the number representation for a word and word allows you to do the exactly same thing it will generate contextualized embedding meaning when you have to these two sentences you look at these word embeddings they it will generate it differently here this one here is zero you can compare these two and they are different but at the same time it will capture the meaning of a word in a right way so that when you have a statement like tom deserves unbiased judgment unbiased and fair are kind of similar and you will see it will generate a vector which is similar see one one point nine point eight and so on similarly when you have statement like this carnival was packed with fun activities carnival unfair are similar in these two sentences and it will generate similar embedding so you can see bird is very very powerful it can look at the context of the statement and generate the meaningful number representation for a given word it can also generate an embedding for entire sentence let's say you're working on a movie review classification task for the whole sentence it can generate a single vector usually bird will generate a vector of size 768 it's just a number you know it can be anything but usually they will generate this this much the the vector of this much dimension i came across this very good blog on bird by jay alamar he explains things in a visual way so you will understand more details about bird bird is based on a transformer architecture which is the most latest one as of 2021 very widely used in the industry you have to know it if you are in nlp domain there are two versions of bird bird base and bird large bird base uses 12 encoder layers but large uses 24 encoder layers if you want to understand what these encoder layers are and details of the model itself you can go through this article but if you don't want to bother about it it is okay you can just follow my presentation and kind of understand the purpose of the you know the overview of bird but here in this article see the bird will generate first of all for bird you will have to use this special for a sentence in the beginning you will use a special token called cls and in the and you will use a special token called sap scp or a separator so he talks about all of that i think in this article ccls and mask actually cls and we talked about the i think we did not cover the mass language model which we'll cover later but see like you have a word like this and it will generate the individual vector so you can go through this a useful article but was trained by google on 2500 million words in the wikipedia 800 million words on different books you know they took look at different books and they generated this board they trained the board using two approaches one is mass language model so i have this wikipedia article on elon musk and what they did is they masked 15 percent of the worst for example here there is entertainer so they would just mask that and they would generate this training samples and they would train board model now using this artificial task when they train bird model as a side effect they are getting word embeddings so really the end purpose is to get word embeddings but in order to get word embeddings you have to train the you know bird model on artificial tasks so the mass language model was really the artificial tasks that they used to train the model but as a side effect you got meaningful word and sentence embedding the other task that they trained on was next sentence prediction for example if i say i am hungry predict the next statement if the next statement is i would like to have pisa that the probability of that happening is more than you know table has four legs who cares i'm hungry give me some food right so the probability of that statement is very very low using these two approaches they train the board model and today google search is powered by board so board has a direct impact on your life the search become more better in google after they onboarded bird in their search engine the full form here is the full form if you're if you're curious about what bird means now let's look into tensorflow code and we'll generate some sentence and some word embeddings uh in python and tensorflow let's try to locate the bert model on tensorflow hub website so if you google tensorflow hub you will go to the tensorflow hub which is a repository of all the different models and when you go see the models go here in embedding you will see a section for board and word has different models you know like l12 this is like layer 12 hidden state 768 attention uh 12 this one is a bigger one so there is a 12 so this is a barter base and 24 is a birth large so if you read j alamar's blog it talks about bird base which has 12 encoders and bert large has 24. so we are going to use the basic encoder basic bird model this one and the good thing here is you can use this url directly to download the model or you can just copy it sits 389 megabytes so it's going to take some time so i will just copy this url here and i will create a variable called encoder url and then for each of these models there is a corresponding pre-processing url so if you look at this table here here there is a pre-processing url now pre-processing will pre-process your text okay so i'm just going to copy it here i'll just call it pre process all right so i have these two urls and now the next step for me is to create uh hub so hub is this hub okay and you can out of this this thing you can create hub layer almost and you can pass in your preprocess url here and what it will give you is like a function pointer so i will call it word preprocess model and this you can treat it as a function pointer you know here you can supply some bunch of statements and it will do pre-processing on those statements so let's say i am building a movie classification model and i can have you know statement like this or i can have a different model and i i might want to create a word embedding or a sentence embedding for this statement i love python programming of course you do uh so now here text test so i will supply that into this and i will call the output object text pre-processed and it's gonna be dictionary hence i will just print up you know keys because the object might be big and it pre-processed these two sentences and it produced this particular object so let's look at individual elements in this dictionary the first one is input marks here the shape is 2 by 128 2 because we have two sentences so for the first sentence this is the mask for the second one this is the mask now first sentence has three words whereas the mask is five words so what does it mean all right let's try to understand that so when the way word works is it will always put a special token called cls in the beginning and to separate two sentences it will put a spatial token called separator so now if you count tokens one two three four five so see five and these are four and four and two will be six and 128 is a kind of like a maximum length of the sentence so that's why you have 128 and remaining are 0 because you actually have only 5 words so input marks is pretty easy to understand the input type ids are really they're really useful if you have multiple sentences in uh one statement so you will see for our use case it won't be very interesting everything is zero so just just don't worry too much about it now let's look at input word ids all right again i need to put this thing here so there was special cls statement in the beginning and in the end there was a separate statement and the word id for cls is 101 for separator is 102 and these are the individual unique ids for these words and these could be the ids from a vocabulary so this is part of the pre-processing stage in the next stage we will actually create the word embeddings and so on this is for the first statement which is nice movie indeed the second statement is i love python programming so this is the kind of the input word ids for that and you can see for cls it is always fixed one zero one for separator it is always fixed one zero two once the preprocessing stage is done you want to create another layer so you will use the same function here you will create another layer so i will just copy paste this one and the another layer will have encoder url okay so the encoder url will be this and this we will call it bird model let's say okay and the bird model will act like a function pointer just like what we did before so now you can treat it as a function pointer almost like a function and you can supply your pre-process text so i will say text pre-process supply that and this should generate my sentence or word embedding and i'm going to store that into this particular object and i will call this is a dictionary so i'll try to get the keys of that dictionary it's going to take some time but it it will come back at some point all right so this has three keys let's try to examine what those keys are first we are going to look at the pull output pulled output is an embedding for the entire sentence we have two sentences so for nice movie indeed this is the embedding and the embedding vector size is 768. so this 768 vector accurately represents the statement nice movie indeed in form of numbers similarly for the second statement this is the embedding vector and this is pretty powerful now you can use these vectors in your natural language processing task it could be movie review classification name entity recognization it could be anything but bird help you generate a meaningful vector out of your statement now let's look at the second one which is a sequence output sequence output is individual word embedding vectors so the size will be two so for two sentence like for each of the sentence for each of these word it will have 768 size vector so see size is 2 by 128 why 128 because so so 2 is for these two sentences and for each individual sentence you will have some padding okay so you will have some padding and you will have total 128 okay and for each of these words for each word listen nice there is a 768 size vector for movie this is the vector okay and so on now you will say okay why if there is a padding why there are numbers well this is a contextualized embedding so the vector for even padding will have some context of this that's why these are having some values if you look at encoder output okay encoder output let's look at the length of the encoder output that will be 12. now the reason this is 12 is because we are using small bird base so 1 2 3 say up to 12. so and each layer has 768 size embedding vector okay so these encoder outputs is nothing but the output of each individual act encoder so we have 12 that's why 12 is the size and each of them let's see if i look at the first one okay the first one will be again 2 by 128 by 1 768 two because we have two sentences okay 128 is because you know the statement has 128 words including the padding and for each word there is a 768 size embedding vector and the last the last vector by the way the last one like from this layer is nothing but it is same as your sequence output okay so this particular vector if you compare that with let's say sequence output see they are same i mean you can do this operator and you will find that they're all same so i hope you're getting the point that encoder output is the encoder output of all 12 layers and the last one is same as the sequence output now if you want to read more about the api you know like what different elements do here then the good thing is you can just copy paste this url here like this and below you will find some documentation so here you know it says the last value of this list is equal to sequence output from 12 transformer blocks so read through this documentation and i hope you found this tutorial useful i'm going to put the code link in the video description below in the next video we are going to use these pulled output these embedding vectors for doing the movie review classification so in this video i just showed you how you can use bird to generate sentence embedding in the next one we'll do the actual movie review classification i hope you like this video if you did please give it a thumbs up your thumbs up is the fees of this session you're learning things for free on youtube but your thumbs up is actually like paying me a fee so if you like this give it a thumbs up if you don't like it give me a thumbs down it is okay but leave a comment so that i can improve myself in the future videos goodbye

Original Description

What is BERT (Bidirectional Encoder Representations From Transformers) and how it is used to solve NLP tasks? This video provides a very simple explanation of it. I am not going to go in details of how transformer based architecture works etc but instead I will go over an overview where you understand the usage of BERT in NLP tasks. In coding section we will generate sentence and word embeddings using BERT for some sample text. We will cover various topics such as, * Word2vec vc BERT * How BERT is trained on masked language model and next sentence completion task ⭐️ Timestamps ⭐️ 00:00 Introduction 00:39 Theory 11:00 Coding in tensorflow Code: https://github.com/codebasics/deep-learning-keras-tf-tutorial/blob/master/46_BERT_intro/bert_intro.ipynb BERT article: http://jalammar.github.io/illustrated-bert/ Word2Vec video: https://www.youtube.com/watch?v=hQwFeIupNP0 Do you want to learn technology from me? Check https://codebasics.io/?utm_source=description&utm_medium=yt&utm_campaign=description&utm_id=description for my affordable video courses. Deep learning playlist: https://www.youtube.com/playlist?list=PLeo1K3hjS3uu7CxAacxVndI4bE_o3BDtO Machine learning playlist: https://www.youtube.com/playlist?list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw 🔖Hashtags🔖 #bertmodelnlppython #tensorflowbert #tensorflowberttutorial #bert #bertneuralnetwork #bertdeeplearning #whatisbert #bertnlp #bertindeeplearning #bertmodel #bertmodelnlp 🌎 My Website For Video Courses: https://codebasics.io/?utm_source=description&utm_medium=yt&utm_campaign=description&utm_id=description Need help building software or data analytics and AI solutions? My company https://www.atliq.com/ can help. Click on the Contact button on that website. 🎥 Codebasics Hindi channel: https://www.youtube.com/channel/UCTmFBhuhMibVoSfYom1uXEg #️⃣ Social Media #️⃣ 🔗 Discord: https://discord.gg/r42Kbuk 📸 Dhaval's Personal Instagram: https://www.instagram.com/dhavalsays/ 📸 Instagram: https://www.instagram.com/c

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from codebasics · codebasics · 0 of 60

← Previous Next →

Python Tutorial - 1. Install python on windows

Python Tutorial - 1. Install python on windows

Python Tutorial - 2. Variables

Python Tutorial - 2. Variables

Python Tutorial - 3. Numbers

Python Tutorial - 3. Numbers

Python Tutorial - 4. Strings

Python Tutorial - 4. Strings

Python Tutorial - 5. Lists

Python Tutorial - 5. Lists

Python Tutorial - 6. Install PyCharm on Windows

Python Tutorial - 6. Install PyCharm on Windows

PyCharm Tutorial - 7. Debug python code using PyCharm

PyCharm Tutorial - 7. Debug python code using PyCharm

Python Tutorial - 8. If Statement

Python Tutorial - 8. If Statement

Python Tutorial - 9. For loop

Python Tutorial - 9. For loop

Python Tutorial - 10. Functions

Python Tutorial - 10. Functions

Python Tutorial - 11. Dictionaries and Tuples

Python Tutorial - 11. Dictionaries and Tuples

Python Tutorial - 12. Modules

Python Tutorial - 12. Modules

Python Tutorial - 13. Reading/Writing Files

Python Tutorial - 13. Reading/Writing Files

How to install Julia on Windows

How to install Julia on Windows

Python Tutorial - 14. Working With JSON

Python Tutorial - 14. Working With JSON

Julia Tutorial - 1. Variables

Julia Tutorial - 1. Variables

Julia Tutorial - 2. Numbers

Julia Tutorial - 2. Numbers

Python Tutorial - 15. if __name__ == "__main__"

Python Tutorial - 15. if __name__ == "__main__"

Julia Tutorial - Why Should I Learn Julia Programming Language

Julia Tutorial - Why Should I Learn Julia Programming Language

Python Tutorial - 16. Exception Handling

Python Tutorial - 16. Exception Handling

Julia Tutorial - 3. Complex and Rational Numbers

Julia Tutorial - 3. Complex and Rational Numbers

Julia Tutorial - 4. Strings

Julia Tutorial - 4. Strings

Python Tutorial - 17. Class and Objects

Python Tutorial - 17. Class and Objects

Julia Tutorial - 5. Functions

Julia Tutorial - 5. Functions

Julia Tutorial - 6. If Statement and Ternary Operator

Julia Tutorial - 6. If Statement and Ternary Operator

Julia Tutorial - 7. For While Loop

Julia Tutorial - 7. For While Loop

Python Tutorial - 18. Inheritance

Python Tutorial - 18. Inheritance

Julia Tutorial - 8. begin and (;) Compound Expressions

Julia Tutorial - 8. begin and (;) Compound Expressions

Python Tutorial - 12.1 - Install Python Module (using pip)

Python Tutorial - 12.1 - Install Python Module (using pip)

Julia Tutorial - 9. Tasks (a.k.a. Generators or Coroutines)

Julia Tutorial - 9. Tasks (a.k.a. Generators or Coroutines)

Julia Tutorial - 10. Exception Handling

Julia Tutorial - 10. Exception Handling

Python Tutorial - 19. Multiple Inheritance

Python Tutorial - 19. Multiple Inheritance

Python Tutorial - 20. Raise Exception And Finally

Python Tutorial - 20. Raise Exception And Finally

Python Tutorial - 21. Iterators

Python Tutorial - 21. Iterators

Python Tutorial - 22. Generators

Python Tutorial - 22. Generators

Python Tutorial - 23. List Set Dict Comprehensions

Python Tutorial - 23. List Set Dict Comprehensions

Python Tutorial - 24. Sets and Frozen Sets

Python Tutorial - 24. Sets and Frozen Sets

Python Tutorial - 25. Command line argument processing using argparse

Python Tutorial - 25. Command line argument processing using argparse

Debugging Tips - What is bug and debugging?

Debugging Tips - What is bug and debugging?

Debugging Tips - Conditional Breakpoint

Debugging Tips - Conditional Breakpoint

Debugging Tips - Watches and Call Stack

Debugging Tips - Watches and Call Stack

Python Tutorial - 26. Multithreading - Introduction

Python Tutorial - 26. Multithreading - Introduction

Git Tutorial 3: How To Install Git

Git Tutorial 3: How To Install Git

Git Tutorial 1: What is git / What is version control system?

Git Tutorial 1: What is git / What is version control system?

Git Tutorial 2 : What is Github? | github tutorial

Git Tutorial 2 : What is Github? | github tutorial

Git Tutorial 4: Basic Commands: add, commit, push

Git Tutorial 4: Basic Commands: add, commit, push

Git Tutorial 5: Undoing/Reverting/Resetting code changes

Git Tutorial 5: Undoing/Reverting/Resetting code changes

Git Tutorial 6: Branches (Create, Merge, Delete a branch)

Git Tutorial 6: Branches (Create, Merge, Delete a branch)

Git Github Tutorial 10: What is Pull Request?

Git Github Tutorial 10: What is Pull Request?

Git Tutorial 7: What is HEAD?

Git Tutorial 7: What is HEAD?

Git Tutorial 9: Diff and Merge using meld

Git Tutorial 9: Diff and Merge using meld

Difference between Multiprocessing and Multithreading

Difference between Multiprocessing and Multithreading

Python Tutorial - 27. Multiprocessing Introduction

Python Tutorial - 27. Multiprocessing Introduction

Python Tutorial - 28. Sharing Data Between Processes Using Array and Value

Python Tutorial - 28. Sharing Data Between Processes Using Array and Value

Git Tutorial 8 - .gitignore file

Git Tutorial 8 - .gitignore file

Python Tutorial - 29. Sharing Data Between Processes Using Multiprocessing Queue

Python Tutorial - 29. Sharing Data Between Processes Using Multiprocessing Queue

Python Tutorial - 30. Multiprocessing Lock

Python Tutorial - 30. Multiprocessing Lock

Python Tutorial - 31. Multiprocessing Pool (Map Reduce)

Python Tutorial - 31. Multiprocessing Pool (Map Reduce)

Python unit testing - pytest introduction

Python unit testing - pytest introduction

This video provides an introduction to BERT, its architecture, and its applications in NLP tasks, with practical examples using Tensorflow, Keras, and Python. It covers the basics of word embeddings, contextualized word embeddings, and how BERT can be used for tasks such as movie review classification and name entity recognition.

Key Takeaways

Locate the BERT model on TensorFlow Hub website
Download the BERT model using the provided URL
Create a hub layer and pass in the pre-processing URL
Use the hub layer to pre-process text and create a word embedding or sentence embedding
Preprocess text by creating a dictionary of words and their IDs
Create a BERT model by using the preprocessed text as input
Use the BERT model to generate word or sentence embeddings

💡 BERT uses contextualized word embeddings to capture the nuances of language, making it a powerful tool for NLP tasks.

🔒 Pro feature: Ask AI to explain this lesson →

More on: LLM Foundations

View skill →

Getting Started with Vertex AI Gemini 1.5 Flash

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

I TRAINED AN AI TO SOLVE 2+2 (w/ Live Coding)

How to use the ChatGPT API with Python!!

How to use the ChatGPT API with Python!!

Nicholas Renotte

Gemini 2.5: Create an interactive plot of economic data

Gemini 2.5: Create an interactive plot of economic data

Google DeepMind

LangChain Chatbots: Building a Personalized AI Assistant

LangChain Chatbots: Building a Personalized AI Assistant

Analytics Vidhya

Auto-generating meeting notes with Python

Auto-generating meeting notes with Python

Related AI Lessons

Want to get started with deep learning

Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch

Reddit r/deeplearning

Building a Deepfake Detector From Scratch — What Nobody Tells You

Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media

Medium · Deep Learning

Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…

Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance

Medium · Deep Learning

Implementing Neural Style Transfer from Scratch: The Project That Started It All

Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning

Medium · Deep Learning

Chapters (3)

Introduction

0:39 Theory

11:00 Coding in tensorflow

Image Classification with ml5.js

The Coding Train