Inference Engines (Part 1)

Caleb Writes Code · Beginner ·🤖 AI Agents & Automation ·2mo ago
GTC Sessions: https://www.nvidia.com/gtc/session-catalog/sessions/gtc26-s82448/?ncid=ref-inpa-249-prsp-en-us-1-l33 (Deploying AI Agents at Enterprise Scale) https://www.nvidia.com/gtc/session-catalog/sessions/gtc26-s81558/?ncid=ref-inpa-249-prsp-en-us-1-l33 (Post-Training Nemotron With RL) NVIDIA 4080 Super Giveaway: https://docs.google.com/forms/d/1K_70PPbO69ygP32h6PwjDmw8pSeUS97Tk82RVUvHBRY/edit?usp=sharing Inference is an important topic but rather underappreciated especially given the potential gain in how fast and efficient we can run the underlying models. As models grow and architectures are getting more complex, it's important to understand some of the key components when it comes to actually running these models for inference. How did they change over the years? and how has advancements in NVMe, PCIe, and HBM affect it? What will SGLang, vLLM, NVIDIA Dynamo, and Tensor-RT be shaped going forward? #ai #deeplearning #inference #datacenters Chapters 00:00 Intro 01:18 Model Parallelism 02:26 MP Benefits 02:41 SLO 04:19 MP Limitations 04:44 Inference Engine 05:30 Batching 06:46 KV Cache 07:34 Part 2? 07:54 GTC 2026
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

I Used AI Agents to Build 3 Real Businesses in a Day — Wix Headless Runs Them All
Learn how AI agents can be used to build and run real businesses, and the importance of infrastructure in keeping them running
Medium · AI
How I Built a Smart Surf Alert System with Friday in Under 10 Minutes
Build a smart surf alert system using Friday AI in under 10 minutes to receive personalized surf condition updates
Medium · AI
How I Built a Smart Surf Alert System with Friday in Under 10 Minutes
Build a smart surf alert system using Friday AI in under 10 minutes to stay ahead of the waves
Medium · Startup
Modern Lojistikte Görünmez Mimari: Algoritmalar ve Yapay Zekâ ile Rota Optimizasyonu
Learn how algorithms and AI optimize route planning in modern logistics, increasing efficiency and reducing costs
Medium · AI

Chapters (10)

Intro
1:18 Model Parallelism
2:26 MP Benefits
2:41 SLO
4:19 MP Limitations
4:44 Inference Engine
5:30 Batching
6:46 KV Cache
7:34 Part 2?
7:54 GTC 2026
Up next
Why Block gave Goose to the Agentic AI Foundation
The New Stack
Watch →