What I Learned From Implementing LLM Architectures From Scratch (And How to Get Started)
Skills:
LLM Engineering90%
LLM Architecture Gallery: https://llm-gallery.com
In this talk, I discuss what we can learn from implementing LLM architectures from scratch in Python and PyTorch.
The main idea is that to really understand how modern LLMs work, it helps to inspect the actual implementation details: attention variants, normalization layers, configuration files, KV cache optimizations, and the small architectural choices that often make a model work correctly.
I also walk through how I approach new open-weight models, how I compare them against reference implementations, and what broader architecture trends emerge from looking at many recent LLMs.
Chapters:
00:00 Introduction
01:15 Running LLMs locally in Python
02:30 What "Python" means in practice: PyTorch and hardware backends
04:45 The LLM ecosystem: training and inference tools
07:25 Why implementation details matter
09:35 From model releases to architecture diagrams
12:00 Reading model cards and config files
15:50 Debugging architecture implementations
18:00 Comparing against Hugging Face Transformers
20:30 A Gemma 3 RMSNorm example
24:00 A 12-step workflow for understanding new architectures
25:15 The LLM Architecture Gallery
26:10 Architecture trends across recent LLMs
27:30 KV cache motivation
29:40 Grouped-query attention
32:30 Multi-head latent attention
36:40 Sliding window attention
40:00 Sparse and selective attention trends
42:00 KV cache quantization
43:30 LLMs inside agentic software harnesses
46:10 Getting started with LLMs from scratch
48:00 When to use libraries instead of from-scratch code
50:00 Transparent open-source training codebases
52:00 Build a Reasoning Model From Scratch
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: LLM Engineering
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Master the Persona Pattern: Make ChatGPT Think Like a True Expert
Medium · AI
Master the Persona Pattern: Make ChatGPT Think Like a True Expert
Medium · ChatGPT
KrishiBot: How I Built a Multi-Agent AI Tutor — and Why I Kept Adding Layers
Medium · LLM
Day 24: When Medical Nomenclatures Shift, How Does Your Multilingual AI Adapt?
Dev.to AI
Chapters (24)
Introduction
1:15
Running LLMs locally in Python
2:30
What "Python" means in practice: PyTorch and hardware backends
4:45
The LLM ecosystem: training and inference tools
7:25
Why implementation details matter
9:35
From model releases to architecture diagrams
12:00
Reading model cards and config files
15:50
Debugging architecture implementations
18:00
Comparing against Hugging Face Transformers
20:30
A Gemma 3 RMSNorm example
24:00
A 12-step workflow for understanding new architectures
25:15
The LLM Architecture Gallery
26:10
Architecture trends across recent LLMs
27:30
KV cache motivation
29:40
Grouped-query attention
32:30
Multi-head latent attention
36:40
Sliding window attention
40:00
Sparse and selective attention trends
42:00
KV cache quantization
43:30
LLMs inside agentic software harnesses
46:10
Getting started with LLMs from scratch
48:00
When to use libraries instead of from-scratch code
50:00
Transparent open-source training codebases
52:00
Build a Reasoning Model From Scratch
🎓
Tutor Explanation
DeepCamp AI