Agent Inference at the "Speed of Light" — How NVIDIA moves like a $4.3 Trillion Startup
Swyx and Vibhu chat with Nader Khalil (https://x.com/naderlikeladder) and Kyle Kranen (https://x.com/KranenKyle) from NVIDIA about NVIDIA'S DX mission - Brev’s origin as a one-click way to access GPUs, and how the Brev team (https://brev.dev) integrated after acquisition. They discuss security risks of agents that can access files, the internet, and execute code, emphasizing limiting agents to only two capabilities and using isolated environments like Brev VMs for tools such as OpenClaw. Kyle explains NVIDIA Dynamo as a data center scale inference engine that optimizes serving by scaling out, leveraging techniques like prefill/decode disaggregation, scheduling, and Kubernetes-based orchestration, framed around cost, latency, and quality tradeoffs. They also cover NVIDIA’s “SOL” first-principles urgency concept, long-context limits and model/hardware co-design, internal model APIs (build.nvidia.com), and upcoming Dynamo and agent sessions at GTC.
Timestamps
00:00 Agent Security Basics
00:39 Podcast Welcome and Guests
01:26 Surfboard Booth Story
03:45 What Brev Does
05:09 GPU Gift Card Stunt
07:34 Acquisition to Nvidia
09:53 Developer Experience Shift
11:48 DGX Spark on Brev
14:12 Jensen SOL Principle
19:12 Meet Kyle Background
23:08 Nvidia Culture and Email
25:01 Zero Billion Markets
26:55 From Recs to Dynamo
27:34 Test Time Scaling
28:04 Why Dynamo Exists
29:44 Scale Up vs Scale Out
31:18 Serving Big Models
34:08 Quality Cost Latency
35:59 Just Try Again
37:58 Nitron Release Stack
39:25 Disaggregated Inference
42:15 Kubernetes Scaling Logic
44:36 Context Length Limits
50:25 Unhobblers and Breakthroughs
54:33 GTC Dynamo Sessions
57:03 Agents and Codex Rollout
58:10 Email Agent Workflow
59:46 Agent Security Boundaries
01:00:35 Running Models In House
01:00:55 NVIDIA Model Playground Story
01:02:28 Enterprise NIM And Dynamo
01:03:43 Hackathons And Self Driving
01:06:11 Zero Minute Agent Challenge
01:08:04 CLI First Agent Workflows
01:09:41 Why CLIs Beat APIs
01:15:32
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Latent Space · Latent Space · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Ep 18: Petaflops to the People — with George Hotz of tinycorp
Latent Space
FlashAttention-2: Making Transformers 800% faster AND exact
Latent Space
RWKV: Reinventing RNNs for the Transformer Era
Latent Space
Generating your AI Media Empire - with Youssef Rizk of Wondercraft.ai
Latent Space
RAG is a hack - with Jerry Liu of LlamaIndex
Latent Space
The End of Finetuning — with Jeremy Howard of Fast.ai
Latent Space
Why AI Agents Don't Work (yet) - with Kanjun Qiu of Imbue
Latent Space
Powering your Copilot for Data - with Artem Keydunov from Cube.dev
Latent Space
Beating GPT-4 with Open Source Models - with Michael Royzen of Phind
Latent Space
The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis
Latent Space
The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph
Latent Space
The AI-First Graphics Editor - with Suhail Doshi of Playground AI
Latent Space
The Accidental AI Canvas - with Steve Ruiz of tldraw
Latent Space
The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert
Latent Space
The Four Wars of the AI Stack - Dec 2023 Recap
Latent Space
The State of AI in production — with David Hsu of Retool
Latent Space
Building an open AI company - with Ce and Vipul of Together AI
Latent Space
Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal
Latent Space
A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate
Latent Space
Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI
Latent Space
Making Transformers Sing - with Mikey Shulman of Suno
Latent Space
A Comprehensive Overview of Large Language Models - Latent Space Paper Club
Latent Space
Why Google failed to make GPT-3 -- with David Luan of Adept
Latent Space
Personal AI Meetup - Bee, BasedHardware, LangChain LangFriend, Deepgram EmilyAI
Latent Space
Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit
Latent Space
Breaking down the OG GPT Paper by Alec Radford
Latent Space
High Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor
Latent Space
This World Does Not Exist — Joscha Bach, Karan Malhotra, Rob Haisfield (WorldSim, WebSim, Liquid AI)
Latent Space
LLM Asia Paper Club Survey Round
Latent Space
How to train a Million Context LLM — with Mark Huang of Gradient.ai
Latent Space
How AI is Eating Finance - with Mike Conover of Brightwave
Latent Space
How To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)
Latent Space
State of the Art: Training 70B LLMs on 10,000 H100 clusters
Latent Space
The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka
Latent Space
Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI
Latent Space
[LLM Paper Club] Llama 3.1 Paper: The Llama Family of Models
Latent Space
Synthetic data + tool use for LLM improvements 🦙
Latent Space
RLHF vs SFT to break out of local maxima 📈
Latent Space
The Winds of AI Winter (Q2 Four Wars of the AI Stack Recap)
Latent Space
Segment Anything 2: Memory + Vision = Object Permanence — with Nikhila Ravi and Joseph Nelson
Latent Space
Answer.ai & AI Magic with Jeremy Howard
Latent Space
Is finetuning GPT4o worth it?
Latent Space
Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind
Latent Space
Building AGI with OpenAI's Structured Outputs API
Latent Space
Q* for model distillation 🍓
Latent Space
Finetuning LoRAs on BILLIONS of tokens 🤖
Latent Space
Cursor UX team is CRACKED 💻
Latent Space
Choosing the BEST OpenAI model 🏆
Latent Space
How will OpenAI voice mode change API design?
Latent Space
STEALING OpenAI models data 🥷
Latent Space
[Paper Club] 🍓 On Reasoning: Q-STaR and Friends!
Latent Space
[Paper Club] Writing in the Margins: Chunked Prefill KV Caching for Long Context Retrieval
Latent Space
The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org
Latent Space
llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE
Latent Space
Prompt Engineer is NOT a job 📝
Latent Space
Prompt Mining LLMs for better prompts ⛏️
Latent Space
The six pillars of few-shot prompting 🔧
Latent Space
Language Agents: From Reasoning to Acting — with Shunyu Yao of OpenAI, Harrison Chase of LangGraph
Latent Space
[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)
Latent Space
Can you separate intelligence and knowledge?
Latent Space
More on: Agent Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Grok’s federal stall is undercutting SpaceX’s IPO growth story
The Next Web AI
Taiwan moves to detain three over alleged illegal high-end AI server exports to China
The Next Web AI
Top 10 AI Development Companies in Leicester UK (2026)
Medium · AI
China blocks NVIDIA’s RTX 5090D V2 imports while Jensen Huang was in Beijing
The Next Web AI
Chapters (35)
Agent Security Basics
0:39
Podcast Welcome and Guests
1:26
Surfboard Booth Story
3:45
What Brev Does
5:09
GPU Gift Card Stunt
7:34
Acquisition to Nvidia
9:53
Developer Experience Shift
11:48
DGX Spark on Brev
14:12
Jensen SOL Principle
19:12
Meet Kyle Background
23:08
Nvidia Culture and Email
25:01
Zero Billion Markets
26:55
From Recs to Dynamo
27:34
Test Time Scaling
28:04
Why Dynamo Exists
29:44
Scale Up vs Scale Out
31:18
Serving Big Models
34:08
Quality Cost Latency
35:59
Just Try Again
37:58
Nitron Release Stack
39:25
Disaggregated Inference
42:15
Kubernetes Scaling Logic
44:36
Context Length Limits
50:25
Unhobblers and Breakthroughs
54:33
GTC Dynamo Sessions
57:03
Agents and Codex Rollout
58:10
Email Agent Workflow
59:46
Agent Security Boundaries
1:00:35
Running Models In House
1:00:55
NVIDIA Model Playground Story
1:02:28
Enterprise NIM And Dynamo
1:03:43
Hackathons And Self Driving
1:06:11
Zero Minute Agent Challenge
1:08:04
CLI First Agent Workflows
1:09:41
Why CLIs Beat APIs
🎓
Tutor Explanation
DeepCamp AI