What I Learned From Implementing LLM Architectures From Scratch (And How to Get Started)

Sebastian Raschka · Beginner ·🧠 Large Language Models ·18h ago
LLM Architecture Gallery: https://llm-gallery.com In this talk, I discuss what we can learn from implementing LLM architectures from scratch in Python and PyTorch. The main idea is that to really understand how modern LLMs work, it helps to inspect the actual implementation details: attention variants, normalization layers, configuration files, KV cache optimizations, and the small architectural choices that often make a model work correctly. I also walk through how I approach new open-weight models, how I compare them against reference implementations, and what broader architecture trends emerge from looking at many recent LLMs. Chapters: 00:00 Introduction 01:15 Running LLMs locally in Python 02:30 What "Python" means in practice: PyTorch and hardware backends 04:45 The LLM ecosystem: training and inference tools 07:25 Why implementation details matter 09:35 From model releases to architecture diagrams 12:00 Reading model cards and config files 15:50 Debugging architecture implementations 18:00 Comparing against Hugging Face Transformers 20:30 A Gemma 3 RMSNorm example 24:00 A 12-step workflow for understanding new architectures 25:15 The LLM Architecture Gallery 26:10 Architecture trends across recent LLMs 27:30 KV cache motivation 29:40 Grouped-query attention 32:30 Multi-head latent attention 36:40 Sliding window attention 40:00 Sparse and selective attention trends 42:00 KV cache quantization 43:30 LLMs inside agentic software harnesses 46:10 Getting started with LLMs from scratch 48:00 When to use libraries instead of from-scratch code 50:00 Transparent open-source training codebases 52:00 Build a Reasoning Model From Scratch
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Master the Persona Pattern: Make ChatGPT Think Like a True Expert
Learn to make ChatGPT think like a true expert in a specific domain using the persona pattern technique
Medium · AI
Master the Persona Pattern: Make ChatGPT Think Like a True Expert
Learn to make ChatGPT think like a domain expert using the Persona Pattern technique
Medium · ChatGPT
KrishiBot: How I Built a Multi-Agent AI Tutor — and Why I Kept Adding Layers
Learn how to build a multi-agent AI tutor like KrishiBot by incrementally adding layers of complexity to a simple LangGraph pipeline
Medium · LLM
Day 24: When Medical Nomenclatures Shift, How Does Your Multilingual AI Adapt?
Adapting multilingual AI to medical nomenclature shifts is crucial for accurate health advice, and requires more than just translation
Dev.to AI

Chapters (24)

Introduction
1:15 Running LLMs locally in Python
2:30 What "Python" means in practice: PyTorch and hardware backends
4:45 The LLM ecosystem: training and inference tools
7:25 Why implementation details matter
9:35 From model releases to architecture diagrams
12:00 Reading model cards and config files
15:50 Debugging architecture implementations
18:00 Comparing against Hugging Face Transformers
20:30 A Gemma 3 RMSNorm example
24:00 A 12-step workflow for understanding new architectures
25:15 The LLM Architecture Gallery
26:10 Architecture trends across recent LLMs
27:30 KV cache motivation
29:40 Grouped-query attention
32:30 Multi-head latent attention
36:40 Sliding window attention
40:00 Sparse and selective attention trends
42:00 KV cache quantization
43:30 LLMs inside agentic software harnesses
46:10 Getting started with LLMs from scratch
48:00 When to use libraries instead of from-scratch code
50:00 Transparent open-source training codebases
52:00 Build a Reasoning Model From Scratch
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →