TOKENIZATION: How AI models turn text into numbers | Byte-Pair Encoding

Name: TOKENIZATION: How AI models turn text into numbers | Byte-Pair Encoding
Uploaded: 2025-08-07T14:49:22+00:00
Channel: Annie Sexton
Description: Large Language Models don't actually understand language—they understand numbers. But how do we turn words into numbers? In this video, I break down the...

Annie Sexton · Beginner ·🧠 Large Language Models ·7mo ago

Large Language Models don't actually understand language—they understand numbers. But how do we turn words into numbers? In this video, I break down the fascinating process of tokenization and byte-pair encoding (BPE), the foundation of how modern AI models like ChatGPT process text. We'll explore: - Why AI models have vocabulary limits (and why it matters) - Byte-Pair encoding - How AI solves for multiple languages, slang, emoji, typos and made-up words - Thinking in bytes, not characters - How tokens become embeddings (the actual numbers AI uses) Whether you're curious about LLMs, learning…

Watch on YouTube ↗ (saves to browser)

Next Up

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)