Let's reproduce GPT-2 (124M)

Andrej Karpathy · Advanced ·🧠 Large Language Models ·1y ago
We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations. Keep in mind that in some places this video builds on the knowledge from earlier videos in the Zero to Hero Playlist (see my channel). You could also see this video as building my nanoGPT repo, which by the end is about 90% similar…
Watch on YouTube ↗ (saves to browser)

Chapters (13)

intro: Let’s reproduce GPT-2 (124M)
3:39 exploring the GPT-2 (124M) OpenAI checkpoint
13:47 SECTION 1: implementing the GPT-2 nn.Module
28:08 loading the huggingface/GPT-2 parameters
31:00 implementing the forward pass to get logits
33:31 sampling init, prefix tokens, tokenization
37:02 sampling loop
41:47 sample, auto-detect the device
45:50 let’s train: data batches (B,T) → logits (B,T,C)
52:53 cross entropy loss
56:42 optimization loop: overfit a single batch
1:02:00 data loader lite
1:06:14 paramet
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)