Agent Inference at the "Speed of Light" — How NVIDIA moves like a $4.3 Trillion Startup

Latent Space Podcast · Beginner ·📰 AI News & Updates ·2mo ago

Skills: Agent Foundations80%AI Systems Design70%Tool Use & Function Calling60%

Swyx and Vibhu chat with Nader Khalil (https://x.com/naderlikeladder) and Kyle Kranen (https://x.com/KranenKyle) from NVIDIA about NVIDIA'S DX mission - Brev’s origin as a one-click way to access GPUs, and how the Brev team (https://brev.dev) integrated after acquisition. They discuss security risks of agents that can access files, the internet, and execute code, emphasizing limiting agents to only two capabilities and using isolated environments like Brev VMs for tools such as OpenClaw. Kyle explains NVIDIA Dynamo as a data center scale inference engine that optimizes serving by scaling out, leveraging techniques like prefill/decode disaggregation, scheduling, and Kubernetes-based orchestration, framed around cost, latency, and quality tradeoffs. They also cover NVIDIA’s “SOL” first-principles urgency concept, long-context limits and model/hardware co-design, internal model APIs (build.nvidia.com), and upcoming Dynamo and agent sessions at GTC. Timestamps 00:00 Agent Security Basics 00:39 Podcast Welcome and Guests 01:26 Surfboard Booth Story 03:45 What Brev Does 05:09 GPU Gift Card Stunt 07:34 Acquisition to Nvidia 09:53 Developer Experience Shift 11:48 DGX Spark on Brev 14:12 Jensen SOL Principle 19:12 Meet Kyle Background 23:08 Nvidia Culture and Email 25:01 Zero Billion Markets 26:55 From Recs to Dynamo 27:34 Test Time Scaling 28:04 Why Dynamo Exists 29:44 Scale Up vs Scale Out 31:18 Serving Big Models 34:08 Quality Cost Latency 35:59 Just Try Again 37:58 Nitron Release Stack 39:25 Disaggregated Inference 42:15 Kubernetes Scaling Logic 44:36 Context Length Limits 50:25 Unhobblers and Breakthroughs 54:33 GTC Dynamo Sessions 57:03 Agents and Codex Rollout 58:10 Email Agent Workflow 59:46 Agent Security Boundaries 01:00:35 Running Models In House 01:00:55 NVIDIA Model Playground Story 01:02:28 Enterprise NIM And Dynamo 01:03:43 Hackathons And Self Driving 01:06:11 Zero Minute Agent Challenge 01:08:04 CLI First Agent Workflows 01:09:41 Why CLIs Beat APIs 01:15:32

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Latent Space · Latent Space · 0 of 60

← Previous Next →

Ep 18: Petaflops to the People — with George Hotz of tinycorp

Ep 18: Petaflops to the People — with George Hotz of tinycorp

FlashAttention-2: Making Transformers 800% faster AND exact

FlashAttention-2: Making Transformers 800% faster AND exact

RWKV: Reinventing RNNs for the Transformer Era

RWKV: Reinventing RNNs for the Transformer Era

Generating your AI Media Empire - with Youssef Rizk of Wondercraft.ai

Generating your AI Media Empire - with Youssef Rizk of Wondercraft.ai

RAG is a hack - with Jerry Liu of LlamaIndex

RAG is a hack - with Jerry Liu of LlamaIndex

The End of Finetuning — with Jeremy Howard of Fast.ai

The End of Finetuning — with Jeremy Howard of Fast.ai

Why AI Agents Don't Work (yet) - with Kanjun Qiu of Imbue

Why AI Agents Don't Work (yet) - with Kanjun Qiu of Imbue

Powering your Copilot for Data - with Artem Keydunov from Cube.dev

Powering your Copilot for Data - with Artem Keydunov from Cube.dev

Beating GPT-4 with Open Source Models - with Michael Royzen of Phind

Beating GPT-4 with Open Source Models - with Michael Royzen of Phind

The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis

The State of Silicon and the GPU Poors - with Dylan Patel of SemiAnalysis

The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph

The "Normsky" architecture for AI coding agents — with Beyang Liu + Steve Yegge of SourceGraph

The AI-First Graphics Editor - with Suhail Doshi of Playground AI

The AI-First Graphics Editor - with Suhail Doshi of Playground AI

The Accidental AI Canvas - with Steve Ruiz of tldraw

The Accidental AI Canvas - with Steve Ruiz of tldraw

The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert

The Origin and Future of RLHF: the secret ingredient for ChatGPT - with Nathan Lambert

The Four Wars of the AI Stack - Dec 2023 Recap

The Four Wars of the AI Stack - Dec 2023 Recap

The State of AI in production — with David Hsu of Retool

The State of AI in production — with David Hsu of Retool

Building an open AI company - with Ce and Vipul of Together AI

Building an open AI company - with Ce and Vipul of Together AI

Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal

Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate

Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI

Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI

Making Transformers Sing - with Mikey Shulman of Suno

Making Transformers Sing - with Mikey Shulman of Suno

A Comprehensive Overview of Large Language Models - Latent Space Paper Club

A Comprehensive Overview of Large Language Models - Latent Space Paper Club

Why Google failed to make GPT-3 -- with David Luan of Adept

Why Google failed to make GPT-3 -- with David Luan of Adept

Personal AI Meetup - Bee, BasedHardware, LangChain LangFriend, Deepgram EmilyAI

Personal AI Meetup - Bee, BasedHardware, LangChain LangFriend, Deepgram EmilyAI

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit

Supervise the Process of AI Research — with Jungwon Byun and Andreas Stuhlmüller of Elicit

Breaking down the OG GPT Paper by Alec Radford

Breaking down the OG GPT Paper by Alec Radford

High Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor

High Agency Pydantic over VC Backed Frameworks — with Jason Liu of Instructor

This World Does Not Exist — Joscha Bach, Karan Malhotra, Rob Haisfield (WorldSim, WebSim, Liquid AI)

This World Does Not Exist — Joscha Bach, Karan Malhotra, Rob Haisfield (WorldSim, WebSim, Liquid AI)

LLM Asia Paper Club Survey Round

LLM Asia Paper Club Survey Round

How to train a Million Context LLM — with Mark Huang of Gradient.ai

How to train a Million Context LLM — with Mark Huang of Gradient.ai

How AI is Eating Finance - with Mike Conover of Brightwave

How AI is Eating Finance - with Mike Conover of Brightwave

How To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)

How To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)

State of the Art: Training 70B LLMs on 10,000 H100 clusters

State of the Art: Training 70B LLMs on 10,000 H100 clusters

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

The 10,000x Yolo Researcher Metagame — with Yi Tay of Reka

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI

Training Llama 2, 3 & 4: The Path to Open Source AGI — with Thomas Scialom of Meta AI

[LLM Paper Club] Llama 3.1 Paper: The Llama Family of Models

[LLM Paper Club] Llama 3.1 Paper: The Llama Family of Models

Synthetic data + tool use for LLM improvements 🦙

Synthetic data + tool use for LLM improvements 🦙

RLHF vs SFT to break out of local maxima 📈

RLHF vs SFT to break out of local maxima 📈

The Winds of AI Winter (Q2 Four Wars of the AI Stack Recap)

The Winds of AI Winter (Q2 Four Wars of the AI Stack Recap)

Segment Anything 2: Memory + Vision = Object Permanence — with Nikhila Ravi and Joseph Nelson

Segment Anything 2: Memory + Vision = Object Permanence — with Nikhila Ravi and Joseph Nelson

Answer.ai & AI Magic with Jeremy Howard

Answer.ai & AI Magic with Jeremy Howard

Is finetuning GPT4o worth it?

Is finetuning GPT4o worth it?

Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind

Personal benchmarks vs HumanEval - with Nicholas Carlini of DeepMind

Building AGI with OpenAI's Structured Outputs API

Building AGI with OpenAI's Structured Outputs API

Q* for model distillation 🍓

Q* for model distillation 🍓

Finetuning LoRAs on BILLIONS of tokens 🤖

Finetuning LoRAs on BILLIONS of tokens 🤖

Cursor UX team is CRACKED 💻

Cursor UX team is CRACKED 💻

Choosing the BEST OpenAI model 🏆

Choosing the BEST OpenAI model 🏆

How will OpenAI voice mode change API design?

How will OpenAI voice mode change API design?

STEALING OpenAI models data 🥷

STEALING OpenAI models data 🥷

[Paper Club] 🍓 On Reasoning: Q-STaR and Friends!

[Paper Club] 🍓 On Reasoning: Q-STaR and Friends!

[Paper Club] Writing in the Margins: Chunked Prefill KV Caching for Long Context Retrieval

[Paper Club] Writing in the Margins: Chunked Prefill KV Caching for Long Context Retrieval

The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org

The Ultimate Guide to Prompting - with Sander Schulhoff from LearnPrompting.org

llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE

llm.c's Origin and the Future of LLM Compilers - Andrej Karpathy at CUDA MODE

Prompt Engineer is NOT a job 📝

Prompt Engineer is NOT a job 📝

Prompt Mining LLMs for better prompts ⛏️

Prompt Mining LLMs for better prompts ⛏️

The six pillars of few-shot prompting 🔧

The six pillars of few-shot prompting 🔧

Language Agents: From Reasoning to Acting — with Shunyu Yao of OpenAI, Harrison Chase of LangGraph

Language Agents: From Reasoning to Acting — with Shunyu Yao of OpenAI, Harrison Chase of LangGraph

[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)

[Paper Club] Who Validates the Validators? Aligning LLM-Judges with Humans (w/ Eugene Yan)

Can you separate intelligence and knowledge?

Can you separate intelligence and knowledge?

More on: Agent Foundations

View skill →

Build and Deploy an Agent with Reasoning Engine in Vertex AI

Adding a Phone Gateway to a Virtual Agent

From Zero to Working AI Agent in 60 Seconds

From Zero to Working AI Agent in 60 Seconds

Create An AI Agent With Replit That Automates Your Sales

Create An AI Agent With Replit That Automates Your Sales

Capstone: Autonomous Runway Detection for IoT

Capstone: Autonomous Runway Detection for IoT

AI Agents with Model Context Protocol & Typescript

AI Agents with Model Context Protocol & Typescript

Related AI Lessons

Grok’s federal stall is undercutting SpaceX’s IPO growth story

SpaceX's IPO growth story is threatened by Grok's declining performance, including decreased downloads and stalled federal deals

The Next Web AI

Taiwan moves to detain three over alleged illegal high-end AI server exports to China

Taiwan investigates alleged illegal exports of high-end AI servers to China, highlighting the importance of semiconductor export controls

The Next Web AI

Top 10 AI Development Companies in Leicester UK (2026)

Discover top AI development companies in Leicester, UK, and learn how they're transforming businesses

China blocks NVIDIA’s RTX 5090D V2 imports while Jensen Huang was in Beijing

China blocks NVIDIA's RTX 5090D V2 imports, affecting AI buyers who used it as a workaround, and understand the implications of this move on the AI industry

The Next Web AI

Chapters (35)

Agent Security Basics

0:39 Podcast Welcome and Guests

1:26 Surfboard Booth Story

3:45 What Brev Does

5:09 GPU Gift Card Stunt

7:34 Acquisition to Nvidia

9:53 Developer Experience Shift

11:48 DGX Spark on Brev

14:12 Jensen SOL Principle

19:12 Meet Kyle Background

23:08 Nvidia Culture and Email

25:01 Zero Billion Markets

26:55 From Recs to Dynamo

27:34 Test Time Scaling

28:04 Why Dynamo Exists

29:44 Scale Up vs Scale Out

31:18 Serving Big Models

34:08 Quality Cost Latency

35:59 Just Try Again

37:58 Nitron Release Stack

39:25 Disaggregated Inference

42:15 Kubernetes Scaling Logic

44:36 Context Length Limits

50:25 Unhobblers and Breakthroughs

54:33 GTC Dynamo Sessions

57:03 Agents and Codex Rollout

58:10 Email Agent Workflow

59:46 Agent Security Boundaries

1:00:35 Running Models In House

1:00:55 NVIDIA Model Playground Story

1:02:28 Enterprise NIM And Dynamo

1:03:43 Hackathons And Self Driving

1:06:11 Zero Minute Agent Challenge

1:08:04 CLI First Agent Workflows

1:09:41 Why CLIs Beat APIs

Tamil Anthem Row Explained: Why TVK Government Faces Backlash