Tokenization and Byte Pair Encoding
LLMs don't process words, they process tokens. What are tokens? They are groups of characters, which break down words in a logical way. In order to train a well performing LLM, good tokenization is essential.
In this video, you'll learn tokenization and one of its most common methods: byte-pair encoding (BPE)
To see the whole LLM course, click here!
https://www.serrano.academy/large-language-models
Watch on YouTube ↗
(saves to browser)
DeepCamp AI