Why We Built LangSmith for Improving Agent Quality

LangChain · Beginner ·🤖 AI Agents & Automation ·6mo ago

Skills: Agent Foundations90%Tool Use & Function Calling80%RAG Evaluation70%Multi-Agent Systems60%Autonomous Workflows60%

Harrison Chase (CEO of LangChain) sits down with Bagatur (LangSmith Engineer) and Tanushree (Product Manager) for a technical roundtable on bringing production agents from prototype to rigor. They discuss the evolution of LangSmith's platform, dive deep into the new Insights Agent feature for automatically discovering patterns in production traces, and explore Multi-turn Evaluations for understanding end-to-end user interactions. 00:00 - Introductions + the evolution of LangSmith 02:39 - Introducing Insights Agent 03:49 - Real-world use cases for Insights Agent 04:44 - Customizing insights for your specific use case 05:22 - The algorithm behind Insights Agent 06:30 - The hardest part of getting Insights to work 07:13 - Tips for getting started with Insights 08:47 - Evals vs Insights - what's the difference 09:36 - What are Threads and why do they matter? 11:59 - Offline vs online evals 12:46 - Multi-turn evals for measuring agent performance in production 13:19 - Thread-level metrics and workflows 14:22 - The hot take: "Are evals dead?" 16:08 - The future of testing 17:08 - Closing thoughts Read more about our latest LangSmith updates: https://bit.ly/3WrUNDZ Learn more about Insights Agent: https://docs.langchain.com/langsmith/insights Learn more about Multi-turn Evals: https://docs.langchain.com/langsmith/online-evaluations#configure-multi-turn-online-evaluators

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from LangChain · LangChain · 0 of 60

← Previous Next →

Chat With Your Documents Using LangChain + JavaScript

Chat With Your Documents Using LangChain + JavaScript

LangChain SQL Webinar

LangChain SQL Webinar

LangChain "OpenAI functions" Webinar

LangChain "OpenAI functions" Webinar

LangSmith Launch

LangSmith Launch

LangChain x Pinecone: Supercharging Llama-2 with RAG

LangChain x Pinecone: Supercharging Llama-2 with RAG

LangChain Expression Language

LangChain Expression Language

Building LLM applications with LangChain with Lance

Building LLM applications with LangChain with Lance

Benchmarking Question/Answering Over CSV Data

Benchmarking Question/Answering Over CSV Data

LangChain "RAG Evaluation" Webinar

LangChain "RAG Evaluation" Webinar

Fine-tuning in Your Voice Webinar

Fine-tuning in Your Voice Webinar

Tabular Data Retrieval

Tabular Data Retrieval

Building an LLM Application with Audio by AssemblyAI

Building an LLM Application with Audio by AssemblyAI

Superagent Deepdive Webinar

Superagent Deepdive Webinar

Lessons from Deploying LLMs with LangSmith

Lessons from Deploying LLMs with LangSmith

Shortwave Assistant Deepdive Webinar

Shortwave Assistant Deepdive Webinar

Cognitive Architectures for Language Agents

Cognitive Architectures for Language Agents

Effectively Building with LLMs in the Browser with Jacob

Effectively Building with LLMs in the Browser with Jacob

Data Privacy for LLMs

Data Privacy for LLMs

"Theory of Mind" Webinar with Plastic Labs

"Theory of Mind" Webinar with Plastic Labs

LangChain Templates

LangChain Templates

Using Natural Language to Query Postgres with Jacob

Using Natural Language to Query Postgres with Jacob

Building a Research Assistant from Scratch

Building a Research Assistant from Scratch

Benchmarking RAG over LangChain Docs

Benchmarking RAG over LangChain Docs

Skeleton-of-Thought: Building a New Template from Scratch

Skeleton-of-Thought: Building a New Template from Scratch

Benchmarking Methods for Semi-Structured RAG

Benchmarking Methods for Semi-Structured RAG

LangSmith Highlights: Getting Started

LangSmith Highlights: Getting Started

LangSmith Highlights: Debugging

LangSmith Highlights: Debugging

LangSmith Highlights: Datasets

LangSmith Highlights: Datasets

LangSmith Highlights: Evaluation

LangSmith Highlights: Evaluation

LangSmith Highlights: Human Annotation

LangSmith Highlights: Human Annotation

LangSmith Highlights: Monitoring

LangSmith Highlights: Monitoring

LangSmith Highlights: Hub

LangSmith Highlights: Hub

SQL Research Assistant

SQL Research Assistant

Getting Started with Multi-Modal LLMs

Getting Started with Multi-Modal LLMs

Build a Full Stack RAG App With TypeScript

Build a Full Stack RAG App With TypeScript

Auto-Prompt Builder (with Hosted LangServe)

Auto-Prompt Builder (with Hosted LangServe)

LangChain v0.1.0 Launch: Introduction

LangChain v0.1.0 Launch: Introduction

LangChain v0.1.0 Launch: Observability

LangChain v0.1.0 Launch: Observability

LangChain v0.1.0 Launch: Integrations

LangChain v0.1.0 Launch: Integrations

LangChain v0.1.0 Launch: Composability

LangChain v0.1.0 Launch: Composability

LangChain v0.1.0 Launch: Streaming

LangChain v0.1.0 Launch: Streaming

LangChain v0.1.0 Launch: Output Parsing

LangChain v0.1.0 Launch: Output Parsing

LangChain v0.1.0 Launch: Retrieval

LangChain v0.1.0 Launch: Retrieval

LangChain v0.1.0 Launch: Agents

LangChain v0.1.0 Launch: Agents

Build and Deploy a RAG app with Pinecone Serverless

Build and Deploy a RAG app with Pinecone Serverless

Hosted LangServe + LangChain Templates

Hosted LangServe + LangChain Templates

LangGraph: Intro

LangGraph: Intro

LangGraph: Agent Executor

LangGraph: Agent Executor

LangGraph: Chat Agent Executor

LangGraph: Chat Agent Executor

LangGraph: Human-in-the-Loop

LangGraph: Human-in-the-Loop

LangGraph: Dynamically Returning a Tool Output Directly

LangGraph: Dynamically Returning a Tool Output Directly

LangGraph: Respond in a Specific Format

LangGraph: Respond in a Specific Format

LangGraph: Managing Agent Steps

LangGraph: Managing Agent Steps

LangGraph: Force-Calling a Tool

LangGraph: Force-Calling a Tool

LangGraph: Multi-Agent Workflows

LangGraph: Multi-Agent Workflows

Streaming Events: Introducing a new `stream_events` method

Streaming Events: Introducing a new `stream_events` method

Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve

Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve

Open Source RAG with Nomic's New Embedding Model (and ChromaDB and Ollama)

Open Source RAG with Nomic's New Embedding Model (and ChromaDB and Ollama)

LangGraph: Persistence

LangGraph: Persistence

More on: Agent Foundations

View skill →

Build and Deploy an Agent with Reasoning Engine in Vertex AI

Adding a Phone Gateway to a Virtual Agent

From Zero to Working AI Agent in 60 Seconds

From Zero to Working AI Agent in 60 Seconds

Create An AI Agent With Replit That Automates Your Sales

Create An AI Agent With Replit That Automates Your Sales

Capstone: Autonomous Runway Detection for IoT

Capstone: Autonomous Runway Detection for IoT

AI Agents with Model Context Protocol & Typescript

AI Agents with Model Context Protocol & Typescript

Related AI Lessons

What Happens When We Let AI Govern a Civilization Without Limits?

Explore the implications of AI governing a civilization without limits and its potential impact on our understanding of technology and society

Understanding Generative AI in Financial Operations: A Retail Banker's Guide

Learn how generative AI applies to retail banking operations, including branch network management, loan applications, and customer onboarding

The AI Adoption Milestones Most Companies Are Already Experiencing

Learn the 5 key milestones companies experience when adopting AI and how to navigate them successfully

Forbes Innovation

ZTE Showcases AI Interactive Flat Panel at the Broadband User Congress in Brazil

ZTE's AI interactive flat panel combines conferencing, automation, and monitoring for offices, elder care, and classrooms, enhancing collaboration and efficiency

Chapters (15)

Introductions + the evolution of LangSmith

2:39 Introducing Insights Agent

3:49 Real-world use cases for Insights Agent

4:44 Customizing insights for your specific use case

5:22 The algorithm behind Insights Agent

6:30 The hardest part of getting Insights to work

7:13 Tips for getting started with Insights

8:47 Evals vs Insights - what's the difference

9:36 What are Threads and why do they matter?

11:59 Offline vs online evals

12:46 Multi-turn evals for measuring agent performance in production

13:19 Thread-level metrics and workflows

14:22 The hot take: "Are evals dead?"

16:08 The future of testing

17:08 Closing thoughts

Hermes Agent + HyperFrames: FREE Ai Video Agent!

Julian Goldie SEO