What is BERT? | Deep Learning Tutorial 46 (Tensorflow, Keras & Python)

codebasics · Beginner ·🧬 Deep Learning ·4y ago

Key Takeaways

The video explains the basics of BERT, a popular language model by Google, and how it is used to solve NLP tasks, with demonstrations using Tensorflow, Keras, and Python. It covers the concepts of word embeddings, contextualized word embeddings, and transformer architecture.

Full Transcript

to build a career in natural language processing domain you need to have a knowledge of bird which is a very popular language model by google in this video i'm going to explain you in a very simple language as if if you're a high school student you can also understand it easily what is the point behind bird how bird is used in nlp tasks we will not go into details of transformer architecture etc but we'll have an overview of how bird works basically the bird model the usage and we'll also write some code in tensorflow and will generate some sentence and word embeddings using birds so let's get started let's assume you are working on a text classification task where the input to the model is a word and you want to classify that as either a person or country the input word here is dhoni who is a an indian cricket team captain and he's a person so that's why you would classify the word dhoni as a person now the input is not an image i am showing you image just as a reference but the input is only a word if the input is australia you would of course classify it as a country modula another bangladesh cricket player you will classify as a person now think about how this model would process the input world if it has seen muhammadullah or dhoni words before it can classify them as a person but let's say the input word is comments now the word comments how does the model interpret this word and can classify as a person it is little bit challenging you know you might be a little confused on how the model would do it so the essence here is how can we capture similarity between two words similarity as in comments is a person and a clicket player at the same time muhammadullah is also a person and a cricket player how can you say mahmudullah and comments are similar and let's say australia which is a country and comments are not very similar all right let's think about if you have two homes how do you say they are similar you look at the features of this home the features are bedroom area bathroom these two homes you can say yeah they're kind of similar but when you have a third home which is a bigger one you know 10 bedrooms 7 500 square foot a pretty rich person would own this kind of home you can say the second home and third home are not similar so if for an object which is home here if you can derive the features then by comparing those features you can say if those two those two objects are similar or not similarly think about how you can translate these words dhoni or scale etc into features the feature could be this okay the one is a person healthy and fit the values are between one zero to one one means like point nine means like really healthy if it is point one you know sick a person who cannot run even one mile and if you compare these individual features you can say dhonian comments are kind of similar but australia is not because c australia location value is 1 here location value is 0 person value is 0 percent value is 1. so if you take all these numbers and create vectors out of it and compare those two vectors you can say that dhoni is more similar to comments and comments in australia are not similar if you are doing a if you're building a model on let's say some cricket vocabulary you might have words such as essays bad comments etc and you can generate feature vector for each of these words these vectors are called word embeddings and we have covered that in previous videos so i recommend you watch those videos but the essence here is that when you compare the feature vector or word embedding of kaminson dhoni you will find that these two are kind of similar whereas australia and zimbabwe are kind of similar so those two are countries those two are people so this is a very powerful concept and one of the ways you can generate the word embedding from the word is by using word to whack so go to youtube search for code basics word to work or code basics word embedding watch those videos you will get a good understanding here we took these features individually but in the real life the models can figure out these features on its own it almost sounds like magical how they can do it but it is possible so for that you have to watch those videos but the issue with word to wack is this carefully read these two sentences the meaning of fair is very different in these two sentences and the first one fair means you know unbiased like a equal treatment in the second one fair means really carnival or you know like fun activity but word to wack will on generate fix embedding vector now if you use a fixed embedding vector in both the context then it's not right because really the meaning of fair in both the sentences is different so you need a model which can generate contextualized meaning of a word meaning you can look at the whole sentence and based on that you generate the number representation for a word and word allows you to do the exactly same thing it will generate contextualized embedding meaning when you have to these two sentences you look at these word embeddings they it will generate it differently here this one here is zero you can compare these two and they are different but at the same time it will capture the meaning of a word in a right way so that when you have a statement like tom deserves unbiased judgment unbiased and fair are kind of similar and you will see it will generate a vector which is similar see one one point nine point eight and so on similarly when you have statement like this carnival was packed with fun activities carnival unfair are similar in these two sentences and it will generate similar embedding so you can see bird is very very powerful it can look at the context of the statement and generate the meaningful number representation for a given word it can also generate an embedding for entire sentence let's say you're working on a movie review classification task for the whole sentence it can generate a single vector usually bird will generate a vector of size 768 it's just a number you know it can be anything but usually they will generate this this much the the vector of this much dimension i came across this very good blog on bird by jay alamar he explains things in a visual way so you will understand more details about bird bird is based on a transformer architecture which is the most latest one as of 2021 very widely used in the industry you have to know it if you are in nlp domain there are two versions of bird bird base and bird large bird base uses 12 encoder layers but large uses 24 encoder layers if you want to understand what these encoder layers are and details of the model itself you can go through this article but if you don't want to bother about it it is okay you can just follow my presentation and kind of understand the purpose of the you know the overview of bird but here in this article see the bird will generate first of all for bird you will have to use this special for a sentence in the beginning you will use a special token called cls and in the and you will use a special token called sap scp or a separator so he talks about all of that i think in this article ccls and mask actually cls and we talked about the i think we did not cover the mass language model which we'll cover later but see like you have a word like this and it will generate the individual vector so you can go through this a useful article but was trained by google on 2500 million words in the wikipedia 800 million words on different books you know they took look at different books and they generated this board they trained the board using two approaches one is mass language model so i have this wikipedia article on elon musk and what they did is they masked 15 percent of the worst for example here there is entertainer so they would just mask that and they would generate this training samples and they would train board model now using this artificial task when they train bird model as a side effect they are getting word embeddings so really the end purpose is to get word embeddings but in order to get word embeddings you have to train the you know bird model on artificial tasks so the mass language model was really the artificial tasks that they used to train the model but as a side effect you got meaningful word and sentence embedding the other task that they trained on was next sentence prediction for example if i say i am hungry predict the next statement if the next statement is i would like to have pisa that the probability of that happening is more than you know table has four legs who cares i'm hungry give me some food right so the probability of that statement is very very low using these two approaches they train the board model and today google search is powered by board so board has a direct impact on your life the search become more better in google after they onboarded bird in their search engine the full form here is the full form if you're if you're curious about what bird means now let's look into tensorflow code and we'll generate some sentence and some word embeddings uh in python and tensorflow let's try to locate the bert model on tensorflow hub website so if you google tensorflow hub you will go to the tensorflow hub which is a repository of all the different models and when you go see the models go here in embedding you will see a section for board and word has different models you know like l12 this is like layer 12 hidden state 768 attention uh 12 this one is a bigger one so there is a 12 so this is a barter base and 24 is a birth large so if you read j alamar's blog it talks about bird base which has 12 encoders and bert large has 24. so we are going to use the basic encoder basic bird model this one and the good thing here is you can use this url directly to download the model or you can just copy it sits 389 megabytes so it's going to take some time so i will just copy this url here and i will create a variable called encoder url and then for each of these models there is a corresponding pre-processing url so if you look at this table here here there is a pre-processing url now pre-processing will pre-process your text okay so i'm just going to copy it here i'll just call it pre process all right so i have these two urls and now the next step for me is to create uh hub so hub is this hub okay and you can out of this this thing you can create hub layer almost and you can pass in your preprocess url here and what it will give you is like a function pointer so i will call it word preprocess model and this you can treat it as a function pointer you know here you can supply some bunch of statements and it will do pre-processing on those statements so let's say i am building a movie classification model and i can have you know statement like this or i can have a different model and i i might want to create a word embedding or a sentence embedding for this statement i love python programming of course you do uh so now here text test so i will supply that into this and i will call the output object text pre-processed and it's gonna be dictionary hence i will just print up you know keys because the object might be big and it pre-processed these two sentences and it produced this particular object so let's look at individual elements in this dictionary the first one is input marks here the shape is 2 by 128 2 because we have two sentences so for the first sentence this is the mask for the second one this is the mask now first sentence has three words whereas the mask is five words so what does it mean all right let's try to understand that so when the way word works is it will always put a special token called cls in the beginning and to separate two sentences it will put a spatial token called separator so now if you count tokens one two three four five so see five and these are four and four and two will be six and 128 is a kind of like a maximum length of the sentence so that's why you have 128 and remaining are 0 because you actually have only 5 words so input marks is pretty easy to understand the input type ids are really they're really useful if you have multiple sentences in uh one statement so you will see for our use case it won't be very interesting everything is zero so just just don't worry too much about it now let's look at input word ids all right again i need to put this thing here so there was special cls statement in the beginning and in the end there was a separate statement and the word id for cls is 101 for separator is 102 and these are the individual unique ids for these words and these could be the ids from a vocabulary so this is part of the pre-processing stage in the next stage we will actually create the word embeddings and so on this is for the first statement which is nice movie indeed the second statement is i love python programming so this is the kind of the input word ids for that and you can see for cls it is always fixed one zero one for separator it is always fixed one zero two once the preprocessing stage is done you want to create another layer so you will use the same function here you will create another layer so i will just copy paste this one and the another layer will have encoder url okay so the encoder url will be this and this we will call it bird model let's say okay and the bird model will act like a function pointer just like what we did before so now you can treat it as a function pointer almost like a function and you can supply your pre-process text so i will say text pre-process supply that and this should generate my sentence or word embedding and i'm going to store that into this particular object and i will call this is a dictionary so i'll try to get the keys of that dictionary it's going to take some time but it it will come back at some point all right so this has three keys let's try to examine what those keys are first we are going to look at the pull output pulled output is an embedding for the entire sentence we have two sentences so for nice movie indeed this is the embedding and the embedding vector size is 768. so this 768 vector accurately represents the statement nice movie indeed in form of numbers similarly for the second statement this is the embedding vector and this is pretty powerful now you can use these vectors in your natural language processing task it could be movie review classification name entity recognization it could be anything but bird help you generate a meaningful vector out of your statement now let's look at the second one which is a sequence output sequence output is individual word embedding vectors so the size will be two so for two sentence like for each of the sentence for each of these word it will have 768 size vector so see size is 2 by 128 why 128 because so so 2 is for these two sentences and for each individual sentence you will have some padding okay so you will have some padding and you will have total 128 okay and for each of these words for each word listen nice there is a 768 size vector for movie this is the vector okay and so on now you will say okay why if there is a padding why there are numbers well this is a contextualized embedding so the vector for even padding will have some context of this that's why these are having some values if you look at encoder output okay encoder output let's look at the length of the encoder output that will be 12. now the reason this is 12 is because we are using small bird base so 1 2 3 say up to 12. so and each layer has 768 size embedding vector okay so these encoder outputs is nothing but the output of each individual act encoder so we have 12 that's why 12 is the size and each of them let's see if i look at the first one okay the first one will be again 2 by 128 by 1 768 two because we have two sentences okay 128 is because you know the statement has 128 words including the padding and for each word there is a 768 size embedding vector and the last the last vector by the way the last one like from this layer is nothing but it is same as your sequence output okay so this particular vector if you compare that with let's say sequence output see they are same i mean you can do this operator and you will find that they're all same so i hope you're getting the point that encoder output is the encoder output of all 12 layers and the last one is same as the sequence output now if you want to read more about the api you know like what different elements do here then the good thing is you can just copy paste this url here like this and below you will find some documentation so here you know it says the last value of this list is equal to sequence output from 12 transformer blocks so read through this documentation and i hope you found this tutorial useful i'm going to put the code link in the video description below in the next video we are going to use these pulled output these embedding vectors for doing the movie review classification so in this video i just showed you how you can use bird to generate sentence embedding in the next one we'll do the actual movie review classification i hope you like this video if you did please give it a thumbs up your thumbs up is the fees of this session you're learning things for free on youtube but your thumbs up is actually like paying me a fee so if you like this give it a thumbs up if you don't like it give me a thumbs down it is okay but leave a comment so that i can improve myself in the future videos goodbye

Original Description

What is BERT (Bidirectional Encoder Representations From Transformers) and how it is used to solve NLP tasks? This video provides a very simple explanation of it. I am not going to go in details of how transformer based architecture works etc but instead I will go over an overview where you understand the usage of BERT in NLP tasks. In coding section we will generate sentence and word embeddings using BERT for some sample text. We will cover various topics such as, * Word2vec vc BERT * How BERT is trained on masked language model and next sentence completion task ⭐️ Timestamps ⭐️ 00:00 Introduction 00:39 Theory 11:00 Coding in tensorflow Code: https://github.com/codebasics/deep-learning-keras-tf-tutorial/blob/master/46_BERT_intro/bert_intro.ipynb BERT article: http://jalammar.github.io/illustrated-bert/ Word2Vec video: https://www.youtube.com/watch?v=hQwFeIupNP0 Do you want to learn technology from me? Check https://codebasics.io/?utm_source=description&utm_medium=yt&utm_campaign=description&utm_id=description for my affordable video courses. Deep learning playlist: https://www.youtube.com/playlist?list=PLeo1K3hjS3uu7CxAacxVndI4bE_o3BDtO Machine learning playlist: https://www.youtube.com/playlist?list=PLeo1K3hjS3uvCeTYTeyfe0-rN5r8zn9rw   🔖Hashtags🔖 #bertmodelnlppython #tensorflowbert #tensorflowberttutorial #bert #bertneuralnetwork #bertdeeplearning #whatisbert #bertnlp #bertindeeplearning #bertmodel #bertmodelnlp 🌎 My Website For Video Courses: https://codebasics.io/?utm_source=description&utm_medium=yt&utm_campaign=description&utm_id=description Need help building software or data analytics and AI solutions? My company https://www.atliq.com/ can help. Click on the Contact button on that website. 🎥 Codebasics Hindi channel: https://www.youtube.com/channel/UCTmFBhuhMibVoSfYom1uXEg #️⃣ Social Media #️⃣ 🔗 Discord: https://discord.gg/r42Kbuk 📸 Dhaval's Personal Instagram: https://www.instagram.com/dhavalsays/ 📸 Instagram: https://www.instagram.com/c
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from codebasics · codebasics · 0 of 60

← Previous Next →
1 Python Tutorial - 1. Install python on windows
Python Tutorial - 1. Install python on windows
codebasics
2 Python Tutorial - 2. Variables
Python Tutorial - 2. Variables
codebasics
3 Python Tutorial - 3. Numbers
Python Tutorial - 3. Numbers
codebasics
4 Python Tutorial - 4. Strings
Python Tutorial - 4. Strings
codebasics
5 Python Tutorial - 5. Lists
Python Tutorial - 5. Lists
codebasics
6 Python Tutorial - 6. Install PyCharm on Windows
Python Tutorial - 6. Install PyCharm on Windows
codebasics
7 PyCharm Tutorial - 7. Debug python code using PyCharm
PyCharm Tutorial - 7. Debug python code using PyCharm
codebasics
8 Python Tutorial -  8. If Statement
Python Tutorial - 8. If Statement
codebasics
9 Python Tutorial - 9. For loop
Python Tutorial - 9. For loop
codebasics
10 Python Tutorial -  10. Functions
Python Tutorial - 10. Functions
codebasics
11 Python Tutorial - 11. Dictionaries and Tuples
Python Tutorial - 11. Dictionaries and Tuples
codebasics
12 Python Tutorial - 12. Modules
Python Tutorial - 12. Modules
codebasics
13 Python Tutorial - 13. Reading/Writing Files
Python Tutorial - 13. Reading/Writing Files
codebasics
14 How to install Julia on Windows
How to install Julia on Windows
codebasics
15 Python Tutorial - 14. Working With JSON
Python Tutorial - 14. Working With JSON
codebasics
16 Julia Tutorial - 1. Variables
Julia Tutorial - 1. Variables
codebasics
17 Julia Tutorial - 2. Numbers
Julia Tutorial - 2. Numbers
codebasics
18 Python Tutorial - 15. if __name__ == "__main__"
Python Tutorial - 15. if __name__ == "__main__"
codebasics
19 Julia Tutorial - Why Should I Learn Julia Programming Language
Julia Tutorial - Why Should I Learn Julia Programming Language
codebasics
20 Python Tutorial  - 16. Exception Handling
Python Tutorial - 16. Exception Handling
codebasics
21 Julia Tutorial - 3. Complex and Rational Numbers
Julia Tutorial - 3. Complex and Rational Numbers
codebasics
22 Julia Tutorial - 4. Strings
Julia Tutorial - 4. Strings
codebasics
23 Python Tutorial -  17. Class and Objects
Python Tutorial - 17. Class and Objects
codebasics
24 Julia Tutorial - 5. Functions
Julia Tutorial - 5. Functions
codebasics
25 Julia Tutorial - 6. If Statement and Ternary Operator
Julia Tutorial - 6. If Statement and Ternary Operator
codebasics
26 Julia Tutorial - 7. For While Loop
Julia Tutorial - 7. For While Loop
codebasics
27 Python Tutorial  - 18. Inheritance
Python Tutorial - 18. Inheritance
codebasics
28 Julia Tutorial - 8. begin and (;) Compound Expressions
Julia Tutorial - 8. begin and (;) Compound Expressions
codebasics
29 Python Tutorial - 12.1 - Install Python Module (using pip)
Python Tutorial - 12.1 - Install Python Module (using pip)
codebasics
30 Julia Tutorial - 9. Tasks (a.k.a. Generators or Coroutines)
Julia Tutorial - 9. Tasks (a.k.a. Generators or Coroutines)
codebasics
31 Julia Tutorial - 10. Exception Handling
Julia Tutorial - 10. Exception Handling
codebasics
32 Python Tutorial  - 19. Multiple Inheritance
Python Tutorial - 19. Multiple Inheritance
codebasics
33 Python Tutorial - 20. Raise Exception And Finally
Python Tutorial - 20. Raise Exception And Finally
codebasics
34 Python Tutorial - 21. Iterators
Python Tutorial - 21. Iterators
codebasics
35 Python Tutorial - 22. Generators
Python Tutorial - 22. Generators
codebasics
36 Python Tutorial - 23. List Set Dict Comprehensions
Python Tutorial - 23. List Set Dict Comprehensions
codebasics
37 Python Tutorial - 24. Sets and Frozen Sets
Python Tutorial - 24. Sets and Frozen Sets
codebasics
38 Python Tutorial - 25. Command line argument processing using argparse
Python Tutorial - 25. Command line argument processing using argparse
codebasics
39 Debugging Tips - What is bug and debugging?
Debugging Tips - What is bug and debugging?
codebasics
40 Debugging Tips - Conditional Breakpoint
Debugging Tips - Conditional Breakpoint
codebasics
41 Debugging Tips - Watches and Call Stack
Debugging Tips - Watches and Call Stack
codebasics
42 Python Tutorial - 26. Multithreading - Introduction
Python Tutorial - 26. Multithreading - Introduction
codebasics
43 Git Tutorial 3:  How To Install Git
Git Tutorial 3: How To Install Git
codebasics
44 Git Tutorial 1: What is git / What is version control system?
Git Tutorial 1: What is git / What is version control system?
codebasics
45 Git Tutorial 2 : What is Github? | github tutorial
Git Tutorial 2 : What is Github? | github tutorial
codebasics
46 Git Tutorial 4: Basic Commands: add, commit, push
Git Tutorial 4: Basic Commands: add, commit, push
codebasics
47 Git Tutorial 5: Undoing/Reverting/Resetting code changes
Git Tutorial 5: Undoing/Reverting/Resetting code changes
codebasics
48 Git Tutorial 6: Branches (Create, Merge, Delete a branch)
Git Tutorial 6: Branches (Create, Merge, Delete a branch)
codebasics
49 Git Github Tutorial 10: What is Pull Request?
Git Github Tutorial 10: What is Pull Request?
codebasics
50 Git Tutorial 7: What is HEAD?
Git Tutorial 7: What is HEAD?
codebasics
51 Git Tutorial 9: Diff and Merge using meld
Git Tutorial 9: Diff and Merge using meld
codebasics
52 Difference between Multiprocessing and Multithreading
Difference between Multiprocessing and Multithreading
codebasics
53 Python Tutorial - 27. Multiprocessing Introduction
Python Tutorial - 27. Multiprocessing Introduction
codebasics
54 Python Tutorial - 28. Sharing Data Between Processes Using Array and Value
Python Tutorial - 28. Sharing Data Between Processes Using Array and Value
codebasics
55 Git Tutorial 8 - .gitignore file
Git Tutorial 8 - .gitignore file
codebasics
56 Python Tutorial - 29. Sharing Data Between Processes Using Multiprocessing Queue
Python Tutorial - 29. Sharing Data Between Processes Using Multiprocessing Queue
codebasics
57 Python Tutorial - 30. Multiprocessing Lock
Python Tutorial - 30. Multiprocessing Lock
codebasics
58 Python Tutorial - 31. Multiprocessing Pool (Map Reduce)
Python Tutorial - 31. Multiprocessing Pool (Map Reduce)
codebasics
59 What is code?
What is code?
codebasics
60 Python unit testing - pytest introduction
Python unit testing - pytest introduction
codebasics

This video provides an introduction to BERT, its architecture, and its applications in NLP tasks, with practical examples using Tensorflow, Keras, and Python. It covers the basics of word embeddings, contextualized word embeddings, and how BERT can be used for tasks such as movie review classification and name entity recognition.

Key Takeaways
  1. Locate the BERT model on TensorFlow Hub website
  2. Download the BERT model using the provided URL
  3. Create a hub layer and pass in the pre-processing URL
  4. Use the hub layer to pre-process text and create a word embedding or sentence embedding
  5. Preprocess text by creating a dictionary of words and their IDs
  6. Create a BERT model by using the preprocessed text as input
  7. Use the BERT model to generate word or sentence embeddings
💡 BERT uses contextualized word embeddings to capture the nuances of language, making it a powerful tool for NLP tasks.

Related AI Lessons

Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning

Chapters (3)

Introduction
0:39 Theory
11:00 Coding in tensorflow
Up next
Image Classification with ml5.js
The Coding Train
Watch →