Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)
Key Takeaways
This video introduces BERT, a machine learning model for natural language processing, and its applications in search engines, text encoding, summarization, and question answering. It also provides a brief overview of how BERT works and how to build a simple semantic search engine using BERT.
Full Transcript
hello i'm jay and this is bert bert is a system it's a tool that understands language more than any other tool we've had in human history now not as good as humans do but it's able to do some impressive things it's freely available for download so anybody can download it and experiment with it and use it to build systems it's incredibly versatile in that it can solve a lot of problems around language you have used bert even without knowing it so if you've used google search you have used birds so let's say you go to google search you want to search for something so you type let's say siri technology and then you get the results sometimes you see this description here at the top but then you also have the results listed on on the right here sometimes when you click you see these highlighted sections that are very relevant to your search query i asked people which one of these used something like bert which one of these steps and in fact all of them have used bert bert is now powering almost every query in the english language and a lot of other languages as well so let's look at some of the tasks that bert helped with here so here it did text encoding and it was used for to retrieve documents based on similarity to the query that you inserted it did that it did other things so this summarization bird does really well at this summarization so potentially it is what google views is for summarization that turns let's say a wikipedia page into like a short paragraph that summarizes it this is also can be the highlighting the relevant parts in a text that are relevant to like let's say a query uh it's something called question answer as well invert does really well with this one example of how google uses bert in search is to say before they rolled out bert when somebody searched for brazil traveler 2 usa the results would give you some pages about usa to brazil travel so it didn't really wasn't able to capture the order of the words in a in a meaningful way but bert enables a search engine to understand that context and how words are related to each other which are very meaningful and important for a search engine another example that you may have come across if you use like gmail or or other email clients that give you these suggested responses so this is a task called response selection and bert does really well at that as well so this is bert and these are some only some of the language tasks that it's able to do this has been your very brief introduction to bert if you're still interested in how it works i can tell you stick around just a couple more minutes bert takes in language so we can throw words at it or a sentence so let's say we want to process the sentence everybody dance now just put this cls token and these two words let's say one before and one after it and birth output something that looks like this it's a table every column of which corresponds to one of the words so our three word uh sentence here each one of them has its own column but a lot of use cases don't care about the specific words they care about the whole sentence and if we are in a use case that cares about a sentence like we'll go into an example now we tend to look at just the first column i'll give you an example for how to use that in search and how maybe you can build your own sort of semantic search engine but before that to establish some visual language i'm not going to copy over this table every time around i just love to use a shape like this where the columns are these columns and the rows instead of the 768 rows that bert uses to represent each word and the sentence at the beginning i'm just going to show these symbolic three rows but in your head you will know that each one of these columns represents a word and it's of this length since we'll focus on search we really don't care about these other words we just care about this first column because this can be understood as a sentence embedding it's a representation of the entire sentence of all the words and so if you want to use birth right off the bat this is the column that we can use right away all right let's build a search engine in two slides to build a search engine you have to have a bunch of web pages so you have a crawler let's say we have these three web pages to begin with we'll do a minimum viable product search engine so hyperion do the matrix we pass the text in them through birds and we get the that column that cls token representation that represents the entire document and so each one of these documents would have its own vector and then after we've searched you know we've we've gathered we crawled the number of pages encoded them via bert we would have this archive before we receive any queries to our search engine we just built an index here and then when somebody goes to your knockoff search engine they search for let's say neo the one we pass that sentence through bert we get that cls as a column token that vector of numbers representing this query we just compare that it's a it's a simple multiplication and addition process that we compare this to each one of these three and that comparison yields a similarity score and just we show the most similar so if it's 90 percent like the this would be the order of the results in the search results page would be the matrix first and then the other non-relevant let's say or less relevant documents so this is the end of this example of how to build a semantic search engine enjoy counting your billions i hope you've enjoyed this very quick brief intro to bert if you want to learn more i have a lot more details on my blog it's uh linked down in the comments below thank you and see you in the next video
Original Description
Since its introduction in 2018, the BERT machine learning model has continued to perform well in a lot of language tasks. This video is a gentle introduction into some of the tasks that BERT can handle (in search engines, for example). The first 3 minutes goes over the some of its applications. Then the video discusses how the model works at a high level (and how you may use it to build a semantic search engine which is sensitive to the meanings of queries and results).
Introduction (0:00)
You have used BERT (applications) (0:25)
How BERT works (2:52)
Building a search engine (4:30)
------
The Illustrated BERT
http://jalammar.github.io/illustrated-bert/
BERT Paper:
https://www.aclweb.org/anthology/N19-1423/
Understanding searches better than ever before
https://blog.google/products/search/search-language-understanding-bert/
Google: BERT now used on almost every English query
https://searchengineland.com/google-bert-used-on-almost-every-english-query-342193
------
Twitter: https://twitter.com/JayAlammar
Blog: https://jalammar.github.io/
Mailing List: https://jayalammar.substack.com/
More videos by Jay:
Explainable AI Cheat Sheet - Five Key Categories
https://www.youtube.com/watch?v=Yg3q5x7yDeM
The Narrated Transformer Language Model
https://youtu.be/-QH8fRhqFHM
Jay's Visual Intro to AI
https://www.youtube.com/watch?v=mSTCzNgDJy4
How GPT-3 Works - Easily Explained with Animations
https://www.youtube.com/watch?v=MQnJZuBGmSQ
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Jay Alammar · Jay Alammar · 12 of 38
1
2
3
4
5
6
7
8
9
10
11
▶
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
Jay's Visual Intro to AI
Jay Alammar
Making Money from AI by Predicting Sales - Jay's Intro to AI Part 2
Jay Alammar
How GPT3 Works - Easily Explained with Animations
Jay Alammar
The Narrated Transformer Language Model
Jay Alammar
My Visualization Tools (my Apple Keynote setup for visualizations and animations)
Jay Alammar
Explainable AI Cheat Sheet - Five Key Categories
Jay Alammar
The Unreasonable Effectiveness of RNNs (Article and Visualization Commentary) [2015 article]
Jay Alammar
Neural Activations & Dataset Examples
Jay Alammar
Up and Down the Ladder of Abstraction [interactive article by Bret Victor, 2011]
Jay Alammar
Probing Classifiers: A Gentle Intro (Explainable AI for Deep Learning)
Jay Alammar
Inspecting Neural Networks with CCA - A Gentle Intro (Explainable AI for Deep Learning)
Jay Alammar
Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)
Jay Alammar
Behavioral Testing of ML Models (Unit tests for machine learning)
Jay Alammar
Favorite AI/ML Books: Intro to ML with Python (Book Review)
Jay Alammar
Favorite Python Books: Effective Python
Jay Alammar
Favorite Stats Books: Seven Pillars of Statistical Wisdom
Jay Alammar
Understanding Animal Languages - Seeing Voices 2
Jay Alammar
How digital assistants like Siri work #shorts
Jay Alammar
Writing Code in Jupyter Notebooks #shorts
Jay Alammar
Experience Grounds Language: Improving language models beyond the world of text
Jay Alammar
pandas for data science in python #shorts
Jay Alammar
The Illustrated Retrieval Transformer
Jay Alammar
AI Image Generation is MIND BLOWING! #shorts
Jay Alammar
A Generalist Agent (Gato) - DeepMind's single model learns 600 tasks
Jay Alammar
The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning
Jay Alammar
AI Art Explained: How AI Generates Images (Stable Diffusion, Midjourney, and DALLE)
Jay Alammar
What is Generative AI? 4 Important Things to Know (about ChatGPT, MidJourney, Cohere & future AIs)
Jay Alammar
AI is Eating The World - This is Where YOU Can Use it to Compete (AI Product Moats)
Jay Alammar
What is LangChain? Where does it fit with LLMs like ChatGPT and Cohere? #shorts
Jay Alammar
Are language models with more parameters better? #shorts #chatgpt
Jay Alammar
How to manage LLM prompts with tools like LangChain #languagemodels #chatgpt
Jay Alammar
What is Llama Index? how does it help in building LLM applications? #languagemodels #chatgpt
Jay Alammar
prompt chains are important for building large language model applications
Jay Alammar
ChatGPT has Never Seen a SINGLE Word (Despite Reading Most of The Internet). Meet LLM Tokenizers.
Jay Alammar
What makes LLM tokenizers different from each other? GPT4 vs. FlanT5 Vs. Starcoder Vs. BERT and more
Jay Alammar
Building LLM Agents with Tool Use
Jay Alammar
SWE-Bench authors reflect on the state of LLM agents at Neurips 2024
Jay Alammar
Learn how ChatGPT and DeepSeek models work: How Transformer LLMs Work [Free Course]
Jay Alammar
More on: LLM Foundations
View skill →Related Reads
📰
📰
📰
📰
Open WebUI: Installation, Features, Errors & Complete Beginner Guide (2026)
Medium · LLM
Pre-training vs Fine-Tuning: How AI Learns Before It Learns You — Part 25
Medium · AI
Pre-training vs Fine-Tuning: How AI Learns Before It Learns You — Part 25
Medium · Machine Learning
The 8 Best AI Tools for Your Master's Thesis in 2026
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI