How GPT, Claude, and Gemini are actually trained and served – Reiner Pope

Dwarkesh Patel · Beginner ·🧠 Large Language Models ·1w ago

Skills: LLM Engineering90%

Did a very different format with Reiner Pope – a blackboard lecture where he walks through how frontier LLMs are trained and served. It's shocking how much you can deduce about what the labs are doing from a handful of equations, public API prices, and some chalk. It’s a bit technical, but I encourage you to hang in there - it’s really worth it. There are less than a handful of people who understand the full stack of AI, from chip design to model architecture, as well as Reiner. It was a real delight to learn from him. Reiner is CEO of MatX, a new chip startup (full disclosure - I’m an angel investor). He was previously at Google, where he worked on software efficiency, compilers, and TPU architecture. Wrote up some flashcards and practice problems to help myself retain what Reiner taught. Hope it's helpful to you too! https://reiner-flashcards.vercel.app/ Download markdown of transcript here to chat with an LLM: https://gist.github.com/dwarkeshsp/79100f0fdeed69d76241903bb0604dbe 0:00:00 – How batch size affects token cost and speed 0:31:59 – How MoE models are laid out across GPU racks 0:47:02 – How pipeline parallelism spreads model layers across racks 1:03:27 – Why Ilya said, “As we now know, pipelining is not wise.” 1:18:49 – Because of RL, models may be 100x over-trained beyond Chinchilla-optimal 1:32:52 – Deducing long context memory costs from API pricing 2:03:52 – Convergent evolution between neural nets and cryptography

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: LLM Engineering

View skill →

Build an LLM and RAG-based Chat Application using AlloyDB and LangChain

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

FULLY LOCAL Mistral AI PDF Processing [Hands-on Tutorial]

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

How to Make an Asteroids Game Bot (LIVE)

How to Make an Asteroids Game Bot (LIVE)

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Using Claude Code + Nano Banana Pro To Create a Dataset of Engineering Drawings

Automata Learning Lab

Advanced AI and Machine Learning Techniques and Capstone

Advanced AI and Machine Learning Techniques and Capstone

Related AI Lessons

You’re Using LLMs Wrong: HTML Is the Missing Control Surface

Discover how HTML can be used as a control surface for Large Language Models (LLMs) to unlock their full potential

The Rise of AI Skills: Why Prompting is No Longer Enough

Learn why prompting alone is no longer sufficient for achieving better AI output and what skills are required to improve AI results

Understanding CUDA and Why It Powers Modern AI & LLMs

Learn how CUDA powers modern AI and LLMs, and why it's crucial for their development

LangChain vs LlamaIndex vs Haystack (2026): AI Framework Comparison

Compare LangChain, LlamaIndex, and Haystack for building LLM applications and learn how to choose the best framework for your project

Chapters (7)

How batch size affects token cost and speed

31:59 How MoE models are laid out across GPU racks

47:02 How pipeline parallelism spreads model layers across racks

1:03:27 Why Ilya said, “As we now know, pipelining is not wise.”

1:18:49 Because of RL, models may be 100x over-trained beyond Chinchilla-optimal

1:32:52 Deducing long context memory costs from API pricing

2:03:52 Convergent evolution between neural nets and cryptography

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)