LangSmith Highlights: Human Annotation

LangChain · Intermediate ·🤖 AI Agents & Automation ·2y ago

Skills: Tool Use & Function Calling90%Agent Foundations80%LLM Foundations70%

Key Takeaways

The video demonstrates how to add human feedback to annotate runs in LangSmith, including tagging a run with feedback and checking out the annotation queue.

Full Transcript

one of the things that we help you do in Lang Smith is ADD human feedback to annotate your runs so we just showed how you can use automatic evaluation to have llms grade your runs or to programmatically Auto evaluate each of your runs but there's really no substitute for a human adding annotation feedback on runs as well you might do this for a couple of reasons maybe there's some kind of measure that's hard to have an automatic evaluation on or maybe you've used auto evals on thousands of runs and you want to have a human just pick through a small sub subset of those runs to make sure that your llm grader is is still doing a good job so I'll show how to do that in this video so this is a test run we have uh some feedback already recorded on each of these runs things like correctness helpfulness and sensitivity uh as well as uh embedding cosine distance and what we're going to do is we're going to pick all of the RS that had a correct score uh and we're going to grab them all and send them to an annotation queue so we're going to add this second human review uh annotation cue and now all of these runs will be queed up in this way that we can easily go through and add our own own feedback and we can we can see here all of the tags this Ron already has but maybe we want to have a different kind of feedback like creativity which is harder for llms to creade and it can have a score of 1 to five and just making this up but this one is a creativity of SC two maybe you have a rubric that a human evaluator wants to follow that this one is done I can now add a score of again creativity to this run here we'll give it a score of five this one is done and you can see how I can quickly add tags at to each of these runs and add some additional feedback manually uh to to the ones that are in my queue so that if I'm uh supporting a a flow of making sure that each of the runs have a good response uh you can do that pretty seamlessly within your annotation que and we're all cut caught up meaning I have no more to review and this is really helpful if you're in a supporter role or you're helping curate data sets uh to make sure that you have the appropriate tags and feedback on each of your runs

Original Description

See how to: -Tag a run with feedback -Check out your annotation queue Log in or sign up for LangSmith (BETA): https://smith.langchain.com/

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from LangChain · LangChain · 30 of 60

← Previous Next →

Chat With Your Documents Using LangChain + JavaScript

Chat With Your Documents Using LangChain + JavaScript

LangChain SQL Webinar

LangChain SQL Webinar

LangChain "OpenAI functions" Webinar

LangChain "OpenAI functions" Webinar

LangSmith Launch

LangSmith Launch

LangChain x Pinecone: Supercharging Llama-2 with RAG

LangChain x Pinecone: Supercharging Llama-2 with RAG

LangChain Expression Language

LangChain Expression Language

Building LLM applications with LangChain with Lance

Building LLM applications with LangChain with Lance

Benchmarking Question/Answering Over CSV Data

Benchmarking Question/Answering Over CSV Data

LangChain "RAG Evaluation" Webinar

LangChain "RAG Evaluation" Webinar

Fine-tuning in Your Voice Webinar

Fine-tuning in Your Voice Webinar

Tabular Data Retrieval

Tabular Data Retrieval

Building an LLM Application with Audio by AssemblyAI

Building an LLM Application with Audio by AssemblyAI

Superagent Deepdive Webinar

Superagent Deepdive Webinar

Lessons from Deploying LLMs with LangSmith

Lessons from Deploying LLMs with LangSmith

Shortwave Assistant Deepdive Webinar

Shortwave Assistant Deepdive Webinar

Cognitive Architectures for Language Agents

Cognitive Architectures for Language Agents

Effectively Building with LLMs in the Browser with Jacob

Effectively Building with LLMs in the Browser with Jacob

Data Privacy for LLMs

Data Privacy for LLMs

"Theory of Mind" Webinar with Plastic Labs

"Theory of Mind" Webinar with Plastic Labs

LangChain Templates

LangChain Templates

Using Natural Language to Query Postgres with Jacob

Using Natural Language to Query Postgres with Jacob

Building a Research Assistant from Scratch

Building a Research Assistant from Scratch

Benchmarking RAG over LangChain Docs

Benchmarking RAG over LangChain Docs

Skeleton-of-Thought: Building a New Template from Scratch

Skeleton-of-Thought: Building a New Template from Scratch

Benchmarking Methods for Semi-Structured RAG

Benchmarking Methods for Semi-Structured RAG

LangSmith Highlights: Getting Started

LangSmith Highlights: Getting Started

LangSmith Highlights: Debugging

LangSmith Highlights: Debugging

LangSmith Highlights: Datasets

LangSmith Highlights: Datasets

LangSmith Highlights: Evaluation

LangSmith Highlights: Evaluation

LangSmith Highlights: Human Annotation

LangSmith Highlights: Human Annotation

LangSmith Highlights: Monitoring

LangSmith Highlights: Monitoring

LangSmith Highlights: Hub

LangSmith Highlights: Hub

SQL Research Assistant

SQL Research Assistant

Getting Started with Multi-Modal LLMs

Getting Started with Multi-Modal LLMs

Build a Full Stack RAG App With TypeScript

Build a Full Stack RAG App With TypeScript

Auto-Prompt Builder (with Hosted LangServe)

Auto-Prompt Builder (with Hosted LangServe)

LangChain v0.1.0 Launch: Introduction

LangChain v0.1.0 Launch: Introduction

LangChain v0.1.0 Launch: Observability

LangChain v0.1.0 Launch: Observability

LangChain v0.1.0 Launch: Integrations

LangChain v0.1.0 Launch: Integrations

LangChain v0.1.0 Launch: Composability

LangChain v0.1.0 Launch: Composability

LangChain v0.1.0 Launch: Streaming

LangChain v0.1.0 Launch: Streaming

LangChain v0.1.0 Launch: Output Parsing

LangChain v0.1.0 Launch: Output Parsing

LangChain v0.1.0 Launch: Retrieval

LangChain v0.1.0 Launch: Retrieval

LangChain v0.1.0 Launch: Agents

LangChain v0.1.0 Launch: Agents

Build and Deploy a RAG app with Pinecone Serverless

Build and Deploy a RAG app with Pinecone Serverless

Hosted LangServe + LangChain Templates

Hosted LangServe + LangChain Templates

LangGraph: Intro

LangGraph: Intro

LangGraph: Agent Executor

LangGraph: Agent Executor

LangGraph: Chat Agent Executor

LangGraph: Chat Agent Executor

LangGraph: Human-in-the-Loop

LangGraph: Human-in-the-Loop

LangGraph: Dynamically Returning a Tool Output Directly

LangGraph: Dynamically Returning a Tool Output Directly

LangGraph: Respond in a Specific Format

LangGraph: Respond in a Specific Format

LangGraph: Managing Agent Steps

LangGraph: Managing Agent Steps

LangGraph: Force-Calling a Tool

LangGraph: Force-Calling a Tool

LangGraph: Multi-Agent Workflows

LangGraph: Multi-Agent Workflows

Streaming Events: Introducing a new `stream_events` method

Streaming Events: Introducing a new `stream_events` method

Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve

Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve

Open Source RAG with Nomic's New Embedding Model (and ChromaDB and Ollama)

Open Source RAG with Nomic's New Embedding Model (and ChromaDB and Ollama)

LangGraph: Persistence

LangGraph: Persistence

This video shows how to use LangSmith to add human feedback to annotated runs, enabling more accurate evaluation and improvement of LLM performance. By following the steps, users can efficiently manage their annotation queues and ensure high-quality datasets.

Key Takeaways

Log in to LangSmith
Select a test run
Pick runs with correct scores
Send runs to annotation queue
Add human feedback and tags to each run
Review and score runs based on creativity or other custom criteria

💡 Human feedback is essential for improving LLM performance, especially for tasks that are difficult to evaluate automatically, such as creativity.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Tool Use & Function Calling

View skill →

Adding a Phone Gateway to a Virtual Agent

Administering an AlloyDB Database

Cloud Storage: Qwik Start - CLI/SDK

Cloud Composer: Copying BigQuery Tables Across Different Locations

Getting started with Firebase Cloud Firestore

Getting Started with Liquid to Customize the Looker User Experience

Related Reads

What’s the best way to trace AI agent decisions and ensure auditability in 2026?

Learn to ensure auditability of AI agent decisions by tracing and explaining their actions, a crucial skill for 2026

What’s the best way to trace AI agent decisions and ensure auditability in 2026?

Learn to ensure auditability of AI agent decisions by tracing their thought process and understanding the context behind their actions

Medium · Machine Learning

What’s the best way to trace AI agent decisions and ensure auditability in 2026?

Learn to trace AI agent decisions for auditability and transparency in 2026

Scored vs Settled: The Metric That Matters For AI Agent Platforms

Learn why the 'settled' metric is crucial for AI agent platforms and how it differs from 'scored' metric, to improve platform performance

DEXPI + AI - The Future of Industrial Automation

ARC Advisory Group