What makes LLM tokenizers different from each other? GPT4 vs. FlanT5 Vs. Starcoder Vs. BERT and more

Jay Alammar · Beginner ·🧠 Large Language Models ·2y ago
Tokenizers are one of the key components of Large Language Models (LLMs). One of the best ways to understand what they do, is to compare the behavior of different tokenizers. In this video, Jay takes a carefully crafted piece of text (that contains English, code, indentation, numbers, emoji, and other languages) and passes it through different trained tokenizers to reveal what they succeed and fail at encoding, and the different design choices for different tokenizers and what they say about their respective models. --- Contents: 0:00 Introduction 1:25 The carefully polished text to test to…
Watch on YouTube ↗ (saves to browser)

Chapters (9)

Introduction
1:25 The carefully polished text to test tokenizers
2:19 BERT Uncased
3:59 BERT Cased
4:29 GPT-2
6:00 FLAN-T5
7:00 GPT-4
9:24 Starcoder
21:31 Galactica

Playlist

Uploads from Jay Alammar · Jay Alammar · 38 of 42

1 Jay's Visual Intro to AI
Jay's Visual Intro to AI
Jay Alammar
2 Making Money from AI by Predicting Sales - Jay's Intro to AI Part 2
Making Money from AI by Predicting Sales - Jay's Intro to AI Part 2
Jay Alammar
3 How GPT3 Works - Easily Explained with Animations
How GPT3 Works - Easily Explained with Animations
Jay Alammar
4 The Narrated Transformer Language Model
The Narrated Transformer Language Model
Jay Alammar
5 My Visualization Tools (my Apple Keynote setup for visualizations and animations)
My Visualization Tools (my Apple Keynote setup for visualizations and animations)
Jay Alammar
6 Explainable AI Cheat Sheet - Five Key Categories
Explainable AI Cheat Sheet - Five Key Categories
Jay Alammar
7 The Unreasonable Effectiveness of RNNs (Article and Visualization Commentary) [2015 article]
The Unreasonable Effectiveness of RNNs (Article and Visualization Commentary) [2015 article]
Jay Alammar
8 Neural Activations & Dataset Examples
Neural Activations & Dataset Examples
Jay Alammar
9 Up and Down the Ladder of Abstraction [interactive article by Bret Victor, 2011]
Up and Down the Ladder of Abstraction [interactive article by Bret Victor, 2011]
Jay Alammar
10 Probing Classifiers: A Gentle Intro (Explainable AI for Deep Learning)
Probing Classifiers: A Gentle Intro (Explainable AI for Deep Learning)
Jay Alammar
11 Inspecting Neural Networks with CCA - A Gentle Intro (Explainable AI for Deep Learning)
Inspecting Neural Networks with CCA - A Gentle Intro (Explainable AI for Deep Learning)
Jay Alammar
12 Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)
Language Processing with BERT: The 3 Minute Intro (Deep learning for NLP)
Jay Alammar
13 Behavioral Testing of ML Models (Unit tests for machine learning)
Behavioral Testing of ML Models (Unit tests for machine learning)
Jay Alammar
14 Favorite AI/ML Books: Intro to ML with Python (Book Review)
Favorite AI/ML Books: Intro to ML with Python (Book Review)
Jay Alammar
15 Favorite Python Books: Effective Python
Favorite Python Books: Effective Python
Jay Alammar
16 Seeing Voices: 1 - Intro to Spectrograms
Seeing Voices: 1 - Intro to Spectrograms
Jay Alammar
17 Favorite Stats Books: Seven Pillars of Statistical Wisdom
Favorite Stats Books: Seven Pillars of Statistical Wisdom
Jay Alammar
18 Understanding Animal Languages - Seeing Voices 2
Understanding Animal Languages - Seeing Voices 2
Jay Alammar
19 AI is NOT smart robots #shorts
AI is NOT smart robots #shorts
Jay Alammar
20 How digital assistants like Siri work #shorts
How digital assistants like Siri work #shorts
Jay Alammar
21 Writing Code in Jupyter Notebooks #shorts
Writing Code in Jupyter Notebooks #shorts
Jay Alammar
22 Experience Grounds Language: Improving language models beyond the world of text
Experience Grounds Language: Improving language models beyond the world of text
Jay Alammar
23 pandas for data science in python #shorts
pandas for data science in python #shorts
Jay Alammar
24 The Illustrated Retrieval Transformer
The Illustrated Retrieval Transformer
Jay Alammar
25 AI Image Generation is MIND BLOWING! #shorts
AI Image Generation is MIND BLOWING! #shorts
Jay Alammar
26 A Generalist Agent (Gato) - DeepMind's single model learns 600 tasks
A Generalist Agent (Gato) - DeepMind's single model learns 600 tasks
Jay Alammar
27 The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning
The Illustrated Word2vec - A Gentle Intro to Word Embeddings in Machine Learning
Jay Alammar
28 Nemesis 2 Intro Remade with AI Generated Images
Nemesis 2 Intro Remade with AI Generated Images
Jay Alammar
29 AI Art Explained: How AI Generates Images (Stable Diffusion, Midjourney, and DALLE)
AI Art Explained: How AI Generates Images (Stable Diffusion, Midjourney, and DALLE)
Jay Alammar
30 What is Generative AI? 4 Important Things to Know (about ChatGPT, MidJourney, Cohere & future AIs)
What is Generative AI? 4 Important Things to Know (about ChatGPT, MidJourney, Cohere & future AIs)
Jay Alammar
31 AI is Eating The World - This is Where YOU Can Use it to Compete (AI Product Moats)
AI is Eating The World - This is Where YOU Can Use it to Compete (AI Product Moats)
Jay Alammar
32 What is LangChain? Where does it fit with LLMs like ChatGPT and Cohere? #shorts
What is LangChain? Where does it fit with LLMs like ChatGPT and Cohere? #shorts
Jay Alammar
33 Are language models with more parameters better? #shorts #chatgpt
Are language models with more parameters better? #shorts #chatgpt
Jay Alammar
34 How to manage LLM prompts with tools like LangChain #languagemodels #chatgpt
How to manage LLM prompts with tools like LangChain #languagemodels #chatgpt
Jay Alammar
35 What is Llama Index? how does it help in building LLM applications? #languagemodels #chatgpt
What is Llama Index? how does it help in building LLM applications? #languagemodels #chatgpt
Jay Alammar
36 prompt chains are important for building large language model applications
prompt chains are important for building large language model applications
Jay Alammar
37 ChatGPT has Never Seen a SINGLE Word (Despite Reading Most of The Internet). Meet LLM Tokenizers.
ChatGPT has Never Seen a SINGLE Word (Despite Reading Most of The Internet). Meet LLM Tokenizers.
Jay Alammar
What makes LLM tokenizers different from each other? GPT4 vs. FlanT5 Vs. Starcoder Vs. BERT and more
What makes LLM tokenizers different from each other? GPT4 vs. FlanT5 Vs. Starcoder Vs. BERT and more
Jay Alammar
39 Building LLM Agents with Tool Use
Building LLM Agents with Tool Use
Jay Alammar
40 SWE-Bench authors reflect on the state of LLM agents at Neurips 2024
SWE-Bench authors reflect on the state of LLM agents at Neurips 2024
Jay Alammar
41 AlphaXiv - a great place to discuss ML papers
AlphaXiv - a great place to discuss ML papers
Jay Alammar
42 Learn how ChatGPT and DeepSeek models work: How Transformer LLMs Work [Free Course]
Learn how ChatGPT and DeepSeek models work: How Transformer LLMs Work [Free Course]
Jay Alammar
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)