Building AI Agents That Survive Production
Skills:
Agent Foundations80%
Haytham Abuelfutuh, Co-founder and CTO of Union.ai and co-author of the open-source orchestrator Flyte, opens the AI Agents 2026 conference in Seattle with a brutally simple message: stop trying to design AI agents that never fail. Build agents that fail cheaply and recover automatically.
In this 25-minute talk, Haytham walks through the three design principles every production agent needs — the 3 D's: Dynamic, Durable, and Defended — and shows what each one actually requires from your platform. He grounds it in a real case study with Dragonfly, who took a laptop prototype to a production agent system indexing 250,000+ products in a single sitting on Flyte 2.
Topics covered:
- The travel agent thought experiment: what 18 years of human agents teach us about long-running sessions, dropped calls, and not asking the user the same question twice
- The show-of-hands problem: why so many teams build agents but so few ever ship them
- The full taxonomy of agent failure: semantic errors, infrastructure errors, network errors, API throttling, and corrupt context
- Dynamic: why agent platforms must run native Python instead of forcing you into a constrained DSL for branching and loops
- Durable: declaring infrastructure inside your code so agents can react to OOMs, spot machine preemption, and crashes
- Crash recovery for long-running sessions: caching non-deterministic LLM calls and tool calls so agents can resume from the last checkpoint
- Cross-session caching: when to share LLM outputs across users and when to recompute
- Defended: sandboxing agent-generated code with Pydantic Monty and network-isolated execution environments
- Human-in-the-loop bailouts when the agent has exhausted its retries
- Dragonfly case study: a four-tier agent architecture (catalog, coordinator, researcher, tools) for product recommendation across 250K+ products
- Q&A: why Union.ai uses Go and Rust under the Python SDK, and how platform teams can shift agent infrastructure left to developers wi
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: Agent Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Getting Started With Agent-to-Agent aka A2A Protocol
Medium · AI
Getting Started With Agent-to-Agent aka A2A Protocol
Medium · Python
Getting Started With Agent-to-Agent aka A2A Protocol
Medium · LLM
One MCP Server or Ten? The Architecture Decision That Can Make or Break Your AI Agent
Medium · Python
🎓
Tutor Explanation
DeepCamp AI