What I Learned From Implementing LLM Architectures From Scratch (And How to Get Started)

Sebastian Raschka · Beginner ·🧠 Large Language Models ·18h ago

Skills: LLM Engineering90%

LLM Architecture Gallery: https://llm-gallery.com In this talk, I discuss what we can learn from implementing LLM architectures from scratch in Python and PyTorch. The main idea is that to really understand how modern LLMs work, it helps to inspect the actual implementation details: attention variants, normalization layers, configuration files, KV cache optimizations, and the small architectural choices that often make a model work correctly. I also walk through how I approach new open-weight models, how I compare them against reference implementations, and what broader architecture trends emerge from looking at many recent LLMs. Chapters: 00:00 Introduction 01:15 Running LLMs locally in Python 02:30 What "Python" means in practice: PyTorch and hardware backends 04:45 The LLM ecosystem: training and inference tools 07:25 Why implementation details matter 09:35 From model releases to architecture diagrams 12:00 Reading model cards and config files 15:50 Debugging architecture implementations 18:00 Comparing against Hugging Face Transformers 20:30 A Gemma 3 RMSNorm example 24:00 A 12-step workflow for understanding new architectures 25:15 The LLM Architecture Gallery 26:10 Architecture trends across recent LLMs 27:30 KV cache motivation 29:40 Grouped-query attention 32:30 Multi-head latent attention 36:40 Sliding window attention 40:00 Sparse and selective attention trends 42:00 KV cache quantization 43:30 LLMs inside agentic software harnesses 46:10 Getting started with LLMs from scratch 48:00 When to use libraries instead of from-scratch code 50:00 Transparent open-source training codebases 52:00 Build a Reasoning Model From Scratch

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Ultimate Guide: Deploy Google ADK Agents to Vertex AI & Cloud Run (Step-by-Step Tutorial)

Shane | LLM Implementation

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Related AI Lessons

Master the Persona Pattern: Make ChatGPT Think Like a True Expert

Learn to make ChatGPT think like a true expert in a specific domain using the persona pattern technique

Master the Persona Pattern: Make ChatGPT Think Like a True Expert

Learn to make ChatGPT think like a domain expert using the Persona Pattern technique

Medium · ChatGPT

KrishiBot: How I Built a Multi-Agent AI Tutor — and Why I Kept Adding Layers

Learn how to build a multi-agent AI tutor like KrishiBot by incrementally adding layers of complexity to a simple LangGraph pipeline

Day 24: When Medical Nomenclatures Shift, How Does Your Multilingual AI Adapt?

Adapting multilingual AI to medical nomenclature shifts is crucial for accurate health advice, and requires more than just translation

Chapters (24)

Introduction

1:15 Running LLMs locally in Python

2:30 What "Python" means in practice: PyTorch and hardware backends

4:45 The LLM ecosystem: training and inference tools

7:25 Why implementation details matter

9:35 From model releases to architecture diagrams

12:00 Reading model cards and config files

15:50 Debugging architecture implementations

18:00 Comparing against Hugging Face Transformers

20:30 A Gemma 3 RMSNorm example

24:00 A 12-step workflow for understanding new architectures

25:15 The LLM Architecture Gallery

26:10 Architecture trends across recent LLMs

27:30 KV cache motivation

29:40 Grouped-query attention

32:30 Multi-head latent attention

36:40 Sliding window attention

40:00 Sparse and selective attention trends

42:00 KV cache quantization

43:30 LLMs inside agentic software harnesses

46:10 Getting started with LLMs from scratch

48:00 When to use libraries instead of from-scratch code

50:00 Transparent open-source training codebases

52:00 Build a Reasoning Model From Scratch

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)