Why GPT Isn’t Creative — But Feels Like It Is
When an LLM finishes training, it doesn’t output a single answer, it outputs a probability distribution over all possible next tokens.
The final behavior of the model depends entirely on how we sample from that distribution.
In this video, we explain the three most important sampling techniques used in modern LLMs:
- Temperature: how probability distributions are sharpened or flattened
- Top-k sampling: restricting generation to the k most likely tokens
- Top-p (nucleus sampling): dynamically selecting tokens based on cumulative probability
We walk through the exact math, int…
Watch on YouTube ↗
(saves to browser)
DeepCamp AI