Building makemore Part 4: Becoming a Backprop Ninja

Andrej Karpathy · Beginner ·📄 Research Papers Explained ·3y ago
We take the 2-layer MLP (with BatchNorm) from the previous video and backpropagate through it manually without using PyTorch autograd's loss.backward(): through the cross entropy loss, 2nd linear layer, tanh, batchnorm, 1st linear layer, and the embedding table. Along the way, we get a strong intuitive understanding about how gradients flow backwards through the compute graph and on the level of efficient Tensors, not just individual scalars like in micrograd. This helps build competence and intuition around how neural nets are optimized and sets you up to more confidently innovate on and debu…
Watch on YouTube ↗ (saves to browser)

Chapters (5)

intro: why you should care & fun history
7:26 starter code
13:01 exercise 1: backproping the atomic compute graph
1:05:17 brief digression: bessel’s correction in batchnorm
1:26:31 exercise 2
How to Ace a Career Change Interview
Next Up
How to Ace a Career Change Interview
Coursera