How We Built LangSmith Engine | Interrupt 26

LangChain · Beginner ·🤖 AI Agents & Automation ·1mo ago

Key Takeaways

Builds LangSmith Engine using AI agents to improve and automate the process of reading traces, evaluating, and fixing issues

Original Description

Until now, improving your agent has been a manual process of reading traces, looking for patterns, writing evals, and creating fixes. Now LangSmith Engine can run that cycle for you. It watches your production traces, clusters failures into named issues, diagnoses root causes against your code, and proposes fixes and eval coverage to keep regressions from coming back. You just review and merge improvements. At LangChain's agent conference Interrupt, Ben Tannyhill and Vivek Trivedy introduced LangSmith Engine and what it unlocks for teams running agents at scale. How We Built LangSmith Engine | Interrupt 26 00:00 Introduction and context 00:33 LangChain as the Agent Engineering Platform 00:50 Our go-to-market agent and the problems we hit 01:47 Why the current process is broken (customer pain) 02:48 What we set out to build 02:45 LangSmith Engine demo: the prioritized issue inbox 03:14 Engine proposes fixes and opens PRs 03:32 Custom online evaluators 03:46 Dataset examples for offline evals 04:28 Architecture overview: how Engine works end-to-end 05:18 Early customers: Clay, Vanta, Campfire 05:23 The first version: a wind-up toy 06:54 The false positive problem ("Show me the man") 07:53 Architecture deep dive: orchestration and sandboxes 09:49 Why traces are the most valuable input 10:47 Connecting source code for PR generation 11:10 Types of fixes Engine generates 12:02 Learning from customers: the preference problem 12:56 The agent overview: Engine's memory file 13:40 Passing to Viv: evaluating Engine itself 14:04 Why evals are the only answer 14:31 How we bootstrapped evals (dogfooding + synthetic data) 15:24 Building a diverse and rounded eval suite 16:14 How evals inform model selection and prompt decisions 17:41 Beyond evals: trusting user feedback 18:24 The self-improving loop: Engine improving Engine 19:04 Key learnings and closing summary 20:36 Thank you Extra resources: • Everything we shipped at Interrupt: https://www.langchain.com/blog/interrupt-2026-
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related Reads

📰
Steal my prompt to turn Codex into an Orchestration Manager
Turn Codex into an Orchestration Manager by creating a single thread for project management, reducing manual intervention and increasing efficiency
Dev.to AI
📰
**Accelerating Digital Transformation in Japan: Leveraging AI for Kaizen and Workforce Harmony**
Learn how Japan is leveraging AI for digital transformation and workforce harmony, and how you can apply similar strategies to your organization
Dev.to AI
📰
The 2026 AI CLI Landscape: Claude Code, Gemini CLI (Antigravity CLI), and OpenClaw
Explore the 2026 AI CLI landscape with Claude Code, Gemini CLI, and OpenClaw to enhance terminal-based AI interactions
Dev.to · DevLycan
📰
The Three Engineering Problems That Make Industrial AIoT Harder Than It Looks — and More Interesting Than Anything Else
Industrial AIoT poses unique engineering challenges that require adaptability and creative problem-solving, making it a fascinating field for engineers
Dev.to · AssetTech

Chapters (28)

Introduction and context
0:33 LangChain as the Agent Engineering Platform
0:50 Our go-to-market agent and the problems we hit
1:47 Why the current process is broken (customer pain)
2:48 What we set out to build
2:45 LangSmith Engine demo: the prioritized issue inbox
3:14 Engine proposes fixes and opens PRs
3:32 Custom online evaluators
3:46 Dataset examples for offline evals
4:28 Architecture overview: how Engine works end-to-end
5:18 Early customers: Clay, Vanta, Campfire
5:23 The first version: a wind-up toy
6:54 The false positive problem ("Show me the man")
7:53 Architecture deep dive: orchestration and sandboxes
9:49 Why traces are the most valuable input
10:47 Connecting source code for PR generation
11:10 Types of fixes Engine generates
12:02 Learning from customers: the preference problem
12:56 The agent overview: Engine's memory file
13:40 Passing to Viv: evaluating Engine itself
14:04 Why evals are the only answer
14:31 How we bootstrapped evals (dogfooding + synthetic data)
15:24 Building a diverse and rounded eval suite
16:14 How evals inform model selection and prompt decisions
17:41 Beyond evals: trusting user feedback
18:24 The self-improving loop: Engine improving Engine
19:04 Key learnings and closing summary
20:36 Thank you
Up next
Multi Agent System EXPLAINED
TestMu AI (Formerly LambdaTest)
Watch →