How Tokenization, Inference, & LLMs Actually Work

Mark Hennings · Intermediate ·🧠 Large Language Models ·2y ago
In this video, I explain how language models generate text, why most of the process is actually deterministic (not random), and how you can shape the probability when selecting a next token from LLMs using parameters like temperature and top p. I cover temperature in-depth and demonstrate with a spreadsheet how different values change the probabilities. Topics: 00:10 Tokens & Why They Matter 03:27 Special Tokens 04:35 The Inference Loop 07:26 Random or Not? 08:11 Deep Dive into Temperature 14:19 Tips for Setting Temperature 16:11 Top P If you'd like to play with the temperature calculator s…
Watch on YouTube ↗ (saves to browser)

Chapters (7)

0:10 Tokens & Why They Matter
3:27 Special Tokens
4:35 The Inference Loop
7:26 Random or Not?
8:11 Deep Dive into Temperature
14:19 Tips for Setting Temperature
16:11 Top P
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)