Tokenization and Byte Pair Encoding

Name: Tokenization and Byte Pair Encoding
Uploaded: 2025-12-27T18:31:20+00:00
Channel: Serrano.Academy
Description: LLMs don't process words, they process tokens. What are tokens? They are groups of characters, which break down words in a logical way. In order to trai...

Serrano.Academy · Beginner ·🧠 Large Language Models ·3mo ago

LLMs don't process words, they process tokens. What are tokens? They are groups of characters, which break down words in a logical way. In order to train a well performing LLM, good tokenization is essential. In this video, you'll learn tokenization and one of its most common methods: byte-pair encoding (BPE) To see the whole LLM course, click here! https://www.serrano.academy/large-language-models

Watch on YouTube ↗ (saves to browser)

Next Up

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)