Dark Factory: OpenClaw Ships Faster Than You Can Read the Diff — Vincent Koc, Comet ML

AI Engineer · Intermediate ·🤖 AI Agents & Automation ·2d ago
Static benchmarks made sense for static software. Agents that adapt to users, rewrite their own harnesses, and shift behavior over time break that assumption. This talk is about what evaluation looks like when the system you're measuring keeps changing underneath you. Vincent Koc traces the arc from prompt engineering to context engineering to intent engineering, where agents self-optimize toward what users actually want. The eval problem compounds at each step: production traces reveal behavioral drift, test suites go stale, and the 20% of edge cases that break your product rarely show up in handcrafted datasets. The alternative he proposes: define the end state, let agents curate their own suites from traces, and treat evals as a living system rather than a point-in-time snapshot. Speaker info: - https://x.com/vincent_koc
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Understanding Real-Time Customer Intent: The New Frontier for Retail AI Chatbots
Learn how retail AI chatbots can leverage real-time customer intent to drive sales and loyalty, and why it matters for modern retail
Medium · AI
Artificial Intelligence Is Not Replacing Humans - It’s Replacing Certain Behaviors
AI is replacing certain human behaviors, not humans themselves, and understanding this distinction is crucial for effective AI integration
Medium · AI
How I cut my LangChain agent's token costs by 93% with one import
Cut LangChain agent's token costs by 93% with a simple import and optimization technique
Dev.to · Mahika jadhav
5 Passive Income Streams Your AI Agent Can Run While You Sleep
Automate passive income streams with AI agents to earn money while you sleep, leveraging affiliate marketing, print-on-demand stores, and more
Dev.to AI
Up next
Introducing Interwhen: Steering reasoning agents with real-time verification
Microsoft Research
Watch →