Inference Engines (Part 1)

Caleb Writes Code · Beginner ·🤖 AI Agents & Automation ·2w ago
GTC Sessions: https://www.nvidia.com/gtc/session-catalog/sessions/gtc26-s82448/?ncid=ref-inpa-249-prsp-en-us-1-l33 (Deploying AI Agents at Enterprise Scale) https://www.nvidia.com/gtc/session-catalog/sessions/gtc26-s81558/?ncid=ref-inpa-249-prsp-en-us-1-l33 (Post-Training Nemotron With RL) NVIDIA 4080 Super Giveaway: https://docs.google.com/forms/d/1K_70PPbO69ygP32h6PwjDmw8pSeUS97Tk82RVUvHBRY/edit?usp=sharing Inference is an important topic but rather underappreciated especially given the potential gain in how fast and efficient we can run the underlying models. As models grow and architect…
Watch on YouTube ↗ (saves to browser)

Chapters (10)

Intro
1:18 Model Parallelism
2:26 MP Benefits
2:41 SLO
4:19 MP Limitations
4:44 Inference Engine
5:30 Batching
6:46 KV Cache
7:34 Part 2?
7:54 GTC 2026
They Hired Me to Steal a Shopping Cart Full of Human DNA 🧬 Darknet Diaries Ep. 160: Greg
Next Up
They Hired Me to Steal a Shopping Cart Full of Human DNA 🧬 Darknet Diaries Ep. 160: Greg
Jack Rhysider