Building makemore Part 2: MLP
We implement a multilayer perceptron (MLP) character-level language model. In this video we also introduce many basics of machine learning (e.g. model training, learning rate tuning, hyperparameters, evaluation, train/dev/test splits, under/overfitting, etc.).
Links:
- makemore on github: https://github.com/karpathy/makemore
- jupyter notebook I built in this video: https://github.com/karpathy/nn-zero-to-hero/blob/master/lectures/makemore/makemore_part2_mlp.ipynb
- collab notebook (new)!!!: https://colab.research.google.com/drive/1YIfmkftLrz6MPTOO9Vwqrop2Q5llHIGK?usp=sharing
- Bengio et al. 2…
Watch on YouTube ↗
(saves to browser)
Chapters (10)
intro
1:48
Bengio et al. 2003 (MLP language model) paper walkthrough
9:03
(re-)building our training dataset
12:19
implementing the embedding lookup table
18:35
implementing the hidden layer + internals of torch.Tensor: storage, views
29:15
implementing the output layer
29:53
implementing the negative log likelihood loss
32:17
summary of the full network
32:49
introducing F.cross_entropy and why
37:56
implementing th
Playlist
Uploads from Andrej Karpathy · Andrej Karpathy · 8 of 17
1
2
3
4
5
6
7
▶
9
10
11
12
13
14
15
16
17
Stable diffusion dreams of steam punk neural networks
Andrej Karpathy
Stable diffusion dreams of "blueberry spaghetti" for one night
Andrej Karpathy
The spelled-out intro to neural networks and backpropagation: building micrograd
Andrej Karpathy
Stable diffusion dreams of tattoos
Andrej Karpathy
Stable diffusion dreams of steampunk brains
Andrej Karpathy
Stable diffusion dreams of psychedelic faces
Andrej Karpathy
The spelled-out intro to language modeling: building makemore
Andrej Karpathy
Building makemore Part 2: MLP
Andrej Karpathy
Building makemore Part 3: Activations & Gradients, BatchNorm
Andrej Karpathy
Building makemore Part 4: Becoming a Backprop Ninja
Andrej Karpathy
Building makemore Part 5: Building a WaveNet
Andrej Karpathy
Let's build GPT: from scratch, in code, spelled out.
Andrej Karpathy
[1hr Talk] Intro to Large Language Models
Andrej Karpathy
Let's build the GPT Tokenizer
Andrej Karpathy
Let's reproduce GPT-2 (124M)
Andrej Karpathy
Deep Dive into LLMs like ChatGPT
Andrej Karpathy
How I use LLMs
Andrej Karpathy
DeepCamp AI