Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)

Jay Alammar · Beginner ·🧠 Large Language Models ·5y ago

Key Takeaways

This video introduces BERT, a machine learning model for natural language processing, and its applications in search engines, text encoding, summarization, and question answering. It also provides a brief overview of how BERT works and how to build a simple semantic search engine using BERT.

Full Transcript

hello i'm jay and this is bert bert is a system it's a tool that understands language more than any other tool we've had in human history now not as good as humans do but it's able to do some impressive things it's freely available for download so anybody can download it and experiment with it and use it to build systems it's incredibly versatile in that it can solve a lot of problems around language you have used bert even without knowing it so if you've used google search you have used birds so let's say you go to google search you want to search for something so you type let's say siri technology and then you get the results sometimes you see this description here at the top but then you also have the results listed on on the right here sometimes when you click you see these highlighted sections that are very relevant to your search query i asked people which one of these used something like bert which one of these steps and in fact all of them have used bert bert is now powering almost every query in the english language and a lot of other languages as well so let's look at some of the tasks that bert helped with here so here it did text encoding and it was used for to retrieve documents based on similarity to the query that you inserted it did that it did other things so this summarization bird does really well at this summarization so potentially it is what google views is for summarization that turns let's say a wikipedia page into like a short paragraph that summarizes it this is also can be the highlighting the relevant parts in a text that are relevant to like let's say a query uh it's something called question answer as well invert does really well with this one example of how google uses bert in search is to say before they rolled out bert when somebody searched for brazil traveler 2 usa the results would give you some pages about usa to brazil travel so it didn't really wasn't able to capture the order of the words in a in a meaningful way but bert enables a search engine to understand that context and how words are related to each other which are very meaningful and important for a search engine another example that you may have come across if you use like gmail or or other email clients that give you these suggested responses so this is a task called response selection and bert does really well at that as well so this is bert and these are some only some of the language tasks that it's able to do this has been your very brief introduction to bert if you're still interested in how it works i can tell you stick around just a couple more minutes bert takes in language so we can throw words at it or a sentence so let's say we want to process the sentence everybody dance now just put this cls token and these two words let's say one before and one after it and birth output something that looks like this it's a table every column of which corresponds to one of the words so our three word uh sentence here each one of them has its own column but a lot of use cases don't care about the specific words they care about the whole sentence and if we are in a use case that cares about a sentence like we'll go into an example now we tend to look at just the first column i'll give you an example for how to use that in search and how maybe you can build your own sort of semantic search engine but before that to establish some visual language i'm not going to copy over this table every time around i just love to use a shape like this where the columns are these columns and the rows instead of the 768 rows that bert uses to represent each word and the sentence at the beginning i'm just going to show these symbolic three rows but in your head you will know that each one of these columns represents a word and it's of this length since we'll focus on search we really don't care about these other words we just care about this first column because this can be understood as a sentence embedding it's a representation of the entire sentence of all the words and so if you want to use birth right off the bat this is the column that we can use right away all right let's build a search engine in two slides to build a search engine you have to have a bunch of web pages so you have a crawler let's say we have these three web pages to begin with we'll do a minimum viable product search engine so hyperion do the matrix we pass the text in them through birds and we get the that column that cls token representation that represents the entire document and so each one of these documents would have its own vector and then after we've searched you know we've we've gathered we crawled the number of pages encoded them via bert we would have this archive before we receive any queries to our search engine we just built an index here and then when somebody goes to your knockoff search engine they search for let's say neo the one we pass that sentence through bert we get that cls as a column token that vector of numbers representing this query we just compare that it's a it's a simple multiplication and addition process that we compare this to each one of these three and that comparison yields a similarity score and just we show the most similar so if it's 90 percent like the this would be the order of the results in the search results page would be the matrix first and then the other non-relevant let's say or less relevant documents so this is the end of this example of how to build a semantic search engine enjoy counting your billions i hope you've enjoyed this very quick brief intro to bert if you want to learn more i have a lot more details on my blog it's uh linked down in the comments below thank you and see you in the next video

Original Description

Since its introduction in 2018, the BERT machine learning model has continued to perform well in a lot of language tasks. This video is a gentle introduction into some of the tasks that BERT can handle (in search engines, for example). The first 3 minutes goes over the some of its applications. Then the video discusses how the model works at a high level (and how you may use it to build a semantic search engine which is sensitive to the meanings of queries and results). Introduction (0:00) You have used BERT (applications) (0:25) How BERT works (2:52) Building a search engine (4:30) ------ The Illustrated BERT http://jalammar.github.io/illustrated-bert/ BERT Paper: https://www.aclweb.org/anthology/N19-1423/ Understanding searches better than ever before https://blog.google/products/search/search-language-understanding-bert/ Google: BERT now used on almost every English query https://searchengineland.com/google-bert-used-on-almost-every-english-query-342193 ------ Twitter: https://twitter.com/JayAlammar Blog: https://jalammar.github.io/ Mailing List: https://jayalammar.substack.com/ More videos by Jay: Explainable AI Cheat Sheet - Five Key Categories https://www.youtube.com/watch?v=Yg3q5x7yDeM The Narrated Transformer Language Model https://youtu.be/-QH8fRhqFHM Jay's Visual Intro to AI https://www.youtube.com/watch?v=mSTCzNgDJy4 How GPT-3 Works - Easily Explained with Animations https://www.youtube.com/watch?v=MQnJZuBGmSQ
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Jay Alammar · Jay Alammar · 12 of 38

1 Jay's Visual Intro to AI
Jay's Visual Intro to AI
Jay Alammar
2 Making Money from AI by Predicting Sales - Jay's Intro to AI Part 2
Making Money from AI by Predicting Sales - Jay's Intro to AI Part 2
Jay Alammar
3 How GPT3 Works - Easily Explained with Animations
How GPT3 Works - Easily Explained with Animations
Jay Alammar
4 The Narrated Transformer Language Model
The Narrated Transformer Language Model
Jay Alammar
5 My Visualization Tools (my Apple Keynote setup for visualizations and animations)
My Visualization Tools (my Apple Keynote setup for visualizations and animations)
Jay Alammar
6 Explainable AI Cheat Sheet - Five Key Categories
Explainable AI Cheat Sheet - Five Key Categories
Jay Alammar
7 The Unreasonable Effectiveness of RNNs (Article and Visualization Commentary) [2015 article]
The Unreasonable Effectiveness of RNNs (Article and Visualization Commentary) [2015 article]
Jay Alammar
8 Neural Activations & Dataset Examples
Neural Activations & Dataset Examples
Jay Alammar
9 Up and Down the Ladder of Abstraction [interactive article by Bret Victor, 2011]
Up and Down the Ladder of Abstraction [interactive article by Bret Victor, 2011]
Jay Alammar
10 Probing Classifiers: A Gentle Intro (Explainable AI for Deep Learning)
Probing Classifiers: A Gentle Intro (Explainable AI for Deep Learning)
Jay Alammar
11 Inspecting Neural Networks with CCA - A Gentle Intro (Explainable AI for Deep Learning)
Inspecting Neural Networks with CCA - A Gentle Intro (Explainable AI for Deep Learning)
Jay Alammar
Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)
Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)
Jay Alammar
13 Behavioral Testing of ML Models (Unit tests for machine learning)
Behavioral Testing of ML Models (Unit tests for machine learning)
Jay Alammar
14 Favorite AI/ML Books: Intro to ML with Python (Book Review)
Favorite AI/ML Books: Intro to ML with Python (Book Review)
Jay Alammar
15 Favorite Python Books: Effective Python
Favorite Python Books: Effective Python
Jay Alammar
16 Favorite Stats Books: Seven Pillars of Statistical Wisdom
Favorite Stats Books: Seven Pillars of Statistical Wisdom
Jay Alammar
17 Understanding Animal Languages - Seeing Voices 2
Understanding Animal Languages - Seeing Voices 2
Jay Alammar
18 How digital assistants like Siri work #shorts
How digital assistants like Siri work #shorts
Jay Alammar
19 Writing Code in Jupyter Notebooks #shorts
Writing Code in Jupyter Notebooks #shorts
Jay Alammar
20 Experience Grounds Language: Improving language models beyond the world of text
Experience Grounds Language: Improving language models beyond the world of text
Jay Alammar
21 pandas for data science in python #shorts
pandas for data science in python #shorts
Jay Alammar
22 The Illustrated Retrieval Transformer
The Illustrated Retrieval Transformer
Jay Alammar
23 AI Image Generation is MIND BLOWING! #shorts
AI Image Generation is MIND BLOWING! #shorts
Jay Alammar
24 A Generalist Agent (Gato) - DeepMind's single model learns 600 tasks
A Generalist Agent (Gato) - DeepMind's single model learns 600 tasks
Jay Alammar
25 The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning
The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning
Jay Alammar
26 AI Art Explained: How AI Generates Images (Stable Diffusion, Midjourney, and DALLE)
AI Art Explained: How AI Generates Images (Stable Diffusion, Midjourney, and DALLE)
Jay Alammar
27 What is Generative AI? 4 Important Things to Know (about ChatGPT, MidJourney, Cohere & future AIs)
What is Generative AI? 4 Important Things to Know (about ChatGPT, MidJourney, Cohere & future AIs)
Jay Alammar
28 AI is Eating The World - This is Where YOU Can Use it to Compete (AI Product Moats)
AI is Eating The World - This is Where YOU Can Use it to Compete (AI Product Moats)
Jay Alammar
29 What is LangChain? Where does it fit with LLMs like ChatGPT and Cohere? #shorts
What is LangChain? Where does it fit with LLMs like ChatGPT and Cohere? #shorts
Jay Alammar
30 Are language models with more parameters better? #shorts #chatgpt
Are language models with more parameters better? #shorts #chatgpt
Jay Alammar
31 How to manage LLM prompts with tools like LangChain #languagemodels #chatgpt
How to manage LLM prompts with tools like LangChain #languagemodels #chatgpt
Jay Alammar
32 What is Llama Index? how does it help in building LLM applications? #languagemodels #chatgpt
What is Llama Index? how does it help in building LLM applications? #languagemodels #chatgpt
Jay Alammar
33 prompt chains are important for building large language model applications
prompt chains are important for building large language model applications
Jay Alammar
34 ChatGPT has Never Seen a SINGLE Word (Despite Reading Most of The Internet). Meet LLM Tokenizers.
ChatGPT has Never Seen a SINGLE Word (Despite Reading Most of The Internet). Meet LLM Tokenizers.
Jay Alammar
35 What makes LLM tokenizers different from each other? GPT4 vs. FlanT5 Vs. Starcoder Vs. BERT and more
What makes LLM tokenizers different from each other? GPT4 vs. FlanT5 Vs. Starcoder Vs. BERT and more
Jay Alammar
36 Building LLM Agents with Tool Use
Building LLM Agents with Tool Use
Jay Alammar
37 SWE-Bench authors reflect on the state of LLM agents at Neurips 2024
SWE-Bench authors reflect on the state of LLM agents at Neurips 2024
Jay Alammar
38 Learn how ChatGPT and DeepSeek models work: How Transformer LLMs Work [Free Course]
Learn how ChatGPT and DeepSeek models work: How Transformer LLMs Work [Free Course]
Jay Alammar

This video introduces BERT and its applications in NLP, and provides a brief overview of how to build a simple semantic search engine using BERT. BERT is a powerful language model that can be used for a variety of tasks, including text encoding, summarization, and question answering. By understanding how BERT works and how to apply it to simple NLP tasks, viewers can gain a deeper understanding of the capabilities and limitations of this technology.

Key Takeaways
  1. Download and install BERT
  2. Use BERT for text encoding and summarization
  3. Build a simple semantic search engine using BERT
  4. Apply BERT to other NLP tasks, such as question answering
💡 BERT is a powerful language model that can be used for a variety of tasks, including text encoding, summarization, and question answering. Its ability to understand the context and relationships between words makes it a valuable tool for search engines and other NLP applications.

Related Reads

📰
Open WebUI: Installation, Features, Errors & Complete Beginner Guide (2026)
Learn to install and use Open WebUI with Docker for a seamless LLM experience
Medium · LLM
📰
Pre-training vs Fine-Tuning: How AI Learns Before It Learns You — Part 25
Learn the difference between pre-training and fine-tuning in AI and how they enable models like ChatGPT to learn and answer questions effectively
Medium · AI
📰
Pre-training vs Fine-Tuning: How AI Learns Before It Learns You — Part 25
Learn how AI models like GPT and BERT learn through pre-training and fine-tuning, and why this matters for their ability to answer specific questions
Medium · Machine Learning
📰
The 8 Best AI Tools for Your Master's Thesis in 2026
Discover 8 AI tools to boost your master's thesis research in 2026, from literature search to summarization and more
Dev.to AI
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →