Building makemore Part 5: Building a WaveNet
We take the 2-layer MLP from previous video and make it deeper with a tree-like structure, arriving at a convolutional neural network architecture similar to the WaveNet (2016) from DeepMind. In the WaveNet paper, the same hierarchical architecture is implemented more efficiently using causal dilated convolutions (not yet covered). Along the way we get a better sense of torch.nn and what it is and how it works under the hood, and what a typical deep learning development process looks like (a lot of reading of documentation, keeping track of multidimensional tensor shapes, moving between jupyte…
Watch on YouTube ↗
(saves to browser)
Chapters (18)
intro
1:40
starter code walkthrough
6:56
let’s fix the learning rate plot
9:16
pytorchifying our code: layers, containers, torch.nn, fun bugs
17:11
overview: WaveNet
19:33
dataset bump the context size to 8
19:55
re-running baseline code on block_size 8
21:36
implementing WaveNet
37:41
training the WaveNet: first pass
38:50
fixing batchnorm1d bug
45:21
re-training WaveNet with bug fix
46:07
scaling up our WaveNet
46:58
experimental harness
47:44
WaveNet but with “dilated causal convolutions”
51:34
torch.nn
52:28
the development process of building deep neural nets
54:17
going forward
55:26
improve on my loss! how far can we
Playlist
Uploads from Andrej Karpathy · Andrej Karpathy · 11 of 17
1
2
3
4
5
6
7
8
9
10
▶
12
13
14
15
16
17
Stable diffusion dreams of steam punk neural networks
Andrej Karpathy
Stable diffusion dreams of "blueberry spaghetti" for one night
Andrej Karpathy
The spelled-out intro to neural networks and backpropagation: building micrograd
Andrej Karpathy
Stable diffusion dreams of tattoos
Andrej Karpathy
Stable diffusion dreams of steampunk brains
Andrej Karpathy
Stable diffusion dreams of psychedelic faces
Andrej Karpathy
The spelled-out intro to language modeling: building makemore
Andrej Karpathy
Building makemore Part 2: MLP
Andrej Karpathy
Building makemore Part 3: Activations & Gradients, BatchNorm
Andrej Karpathy
Building makemore Part 4: Becoming a Backprop Ninja
Andrej Karpathy
Building makemore Part 5: Building a WaveNet
Andrej Karpathy
Let's build GPT: from scratch, in code, spelled out.
Andrej Karpathy
[1hr Talk] Intro to Large Language Models
Andrej Karpathy
Let's build the GPT Tokenizer
Andrej Karpathy
Let's reproduce GPT-2 (124M)
Andrej Karpathy
Deep Dive into LLMs like ChatGPT
Andrej Karpathy
How I use LLMs
Andrej Karpathy
DeepCamp AI