TOKENIZATION: How AI models turn text into numbers | Byte-Pair Encoding
Large Language Models don't actually understand language—they understand numbers. But how do we turn words into numbers? In this video, I break down the fascinating process of tokenization and byte-pair encoding (BPE), the foundation of how modern AI models like ChatGPT process text.
We'll explore:
- Why AI models have vocabulary limits (and why it matters)
- Byte-Pair encoding
- How AI solves for multiple languages, slang, emoji, typos and made-up words
- Thinking in bytes, not characters
- How tokens become embeddings (the actual numbers AI uses)
Whether you're curious about LLMs, learning…
Watch on YouTube ↗
(saves to browser)
DeepCamp AI