A Visual Tour of Modern LLM Architectures

Sebastian Raschka · Beginner ·🧠 Large Language Models ·2d ago
LLM Architecture Gallery: https://sebastianraschka.com/llm-architecture-gallery/ In this video, I take you on a visual tour of modern LLM architectures and walk through the key ideas behind models like DeepSeek, Qwen3-Next, Kimi, Sarvam, Ling 2.5, and Nemotron. We look at what actually changed in recent LLM design, including grouped-query attention (GQA), sliding-window attention, multi-head latent attention (MLA), DeepSeek sparse attention, and hybrid linear attention. The goal of the gallery is to make it easier to compare architectures side by side, connect the diagrams back to papers, c…
Watch on YouTube ↗ (saves to browser)

Chapters (14)

Intro
0:55 Why I built the gallery
1:16 Overview of the LLM Architecture Gallery
4:17 Comparing models side by side
5:41 Benchmarks and the Artificial Intelligence Index
7:03 GPT2 XL as the baseline architecture
10:22 Grouped-Query Attention (GQA)
14:51 Sliding-Window Attention
18:40 Multi-Head Latent Attention (MLA)
25:31 Sarvam 30B vs 105B
27:41 DeepSeek Sparse Attention
30:24 Hybrid Attention and Qwen3-Next
33:20 Kimi Linear, Ling 2.5, and Nemotron
36:39 Poster, future updates, and wrap-up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)