Corrections + Few Shot Examples (Part 1) | LangSmith Evaluations

LangChain · Intermediate ·✍️ Prompt Engineering ·1y ago

Skills: Prompt Craft90%LLM Engineering70%

Evaluation is the process of continuously improving your LLM application. This requires a way to judge your application’s outputs, which are often natural language. Using an LLM to grade natural language outputs (e.g., for correctness relative to a reference answer, tone, or conciseness) is a popular approach, but requires prompt engineering and careful auditing of the LLM judge! Our new release of LangSmith presents a solution to this rising problem, allowing a user to (1) correct LLM-as-a-Judge outputs and then (2) pass those corrections back to the judge as few-shot example for future iterations. This creates LLM-as-a-Judge evaluators grounded in human feedback that better encode your preferences without the need for challenging prompt engineering. Here we show how apply Corrections + Few Shot to online evaluators that are pinned to a project.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from LangChain · LangChain · 0 of 60

← Previous Next →

Chat With Your Documents Using LangChain + JavaScript

Chat With Your Documents Using LangChain + JavaScript

LangChain SQL Webinar

LangChain SQL Webinar

LangChain "OpenAI functions" Webinar

LangChain "OpenAI functions" Webinar

LangSmith Launch

LangSmith Launch

LangChain x Pinecone: Supercharging Llama-2 with RAG

LangChain x Pinecone: Supercharging Llama-2 with RAG

LangChain Expression Language

LangChain Expression Language

Building LLM applications with LangChain with Lance

Building LLM applications with LangChain with Lance

Benchmarking Question/Answering Over CSV Data

Benchmarking Question/Answering Over CSV Data

LangChain "RAG Evaluation" Webinar

LangChain "RAG Evaluation" Webinar

Fine-tuning in Your Voice Webinar

Fine-tuning in Your Voice Webinar

Tabular Data Retrieval

Tabular Data Retrieval

Building an LLM Application with Audio by AssemblyAI

Building an LLM Application with Audio by AssemblyAI

Superagent Deepdive Webinar

Superagent Deepdive Webinar

Lessons from Deploying LLMs with LangSmith

Lessons from Deploying LLMs with LangSmith

Shortwave Assistant Deepdive Webinar

Shortwave Assistant Deepdive Webinar

Cognitive Architectures for Language Agents

Cognitive Architectures for Language Agents

Effectively Building with LLMs in the Browser with Jacob

Effectively Building with LLMs in the Browser with Jacob

Data Privacy for LLMs

Data Privacy for LLMs

"Theory of Mind" Webinar with Plastic Labs

"Theory of Mind" Webinar with Plastic Labs

LangChain Templates

LangChain Templates

Using Natural Language to Query Postgres with Jacob

Using Natural Language to Query Postgres with Jacob

Building a Research Assistant from Scratch

Building a Research Assistant from Scratch

Benchmarking RAG over LangChain Docs

Benchmarking RAG over LangChain Docs

Skeleton-of-Thought: Building a New Template from Scratch

Skeleton-of-Thought: Building a New Template from Scratch

Benchmarking Methods for Semi-Structured RAG

Benchmarking Methods for Semi-Structured RAG

LangSmith Highlights: Getting Started

LangSmith Highlights: Getting Started

LangSmith Highlights: Debugging

LangSmith Highlights: Debugging

LangSmith Highlights: Datasets

LangSmith Highlights: Datasets

LangSmith Highlights: Evaluation

LangSmith Highlights: Evaluation

LangSmith Highlights: Human Annotation

LangSmith Highlights: Human Annotation

LangSmith Highlights: Monitoring

LangSmith Highlights: Monitoring

LangSmith Highlights: Hub

LangSmith Highlights: Hub

SQL Research Assistant

SQL Research Assistant

Getting Started with Multi-Modal LLMs

Getting Started with Multi-Modal LLMs

Build a Full Stack RAG App With TypeScript

Build a Full Stack RAG App With TypeScript

Auto-Prompt Builder (with Hosted LangServe)

Auto-Prompt Builder (with Hosted LangServe)

LangChain v0.1.0 Launch: Introduction

LangChain v0.1.0 Launch: Introduction

LangChain v0.1.0 Launch: Observability

LangChain v0.1.0 Launch: Observability

LangChain v0.1.0 Launch: Integrations

LangChain v0.1.0 Launch: Integrations

LangChain v0.1.0 Launch: Composability

LangChain v0.1.0 Launch: Composability

LangChain v0.1.0 Launch: Streaming

LangChain v0.1.0 Launch: Streaming

LangChain v0.1.0 Launch: Output Parsing

LangChain v0.1.0 Launch: Output Parsing

LangChain v0.1.0 Launch: Retrieval

LangChain v0.1.0 Launch: Retrieval

LangChain v0.1.0 Launch: Agents

LangChain v0.1.0 Launch: Agents

Build and Deploy a RAG app with Pinecone Serverless

Build and Deploy a RAG app with Pinecone Serverless

Hosted LangServe + LangChain Templates

Hosted LangServe + LangChain Templates

LangGraph: Intro

LangGraph: Intro

LangGraph: Agent Executor

LangGraph: Agent Executor

LangGraph: Chat Agent Executor

LangGraph: Chat Agent Executor

LangGraph: Human-in-the-Loop

LangGraph: Human-in-the-Loop

LangGraph: Dynamically Returning a Tool Output Directly

LangGraph: Dynamically Returning a Tool Output Directly

LangGraph: Respond in a Specific Format

LangGraph: Respond in a Specific Format

LangGraph: Managing Agent Steps

LangGraph: Managing Agent Steps

LangGraph: Force-Calling a Tool

LangGraph: Force-Calling a Tool

LangGraph: Multi-Agent Workflows

LangGraph: Multi-Agent Workflows

Streaming Events: Introducing a new `stream_events` method

Streaming Events: Introducing a new `stream_events` method

Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve

Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve

Open Source RAG with Nomic's New Embedding Model (and ChromaDB and Ollama)

Open Source RAG with Nomic's New Embedding Model (and ChromaDB and Ollama)

LangGraph: Persistence

LangGraph: Persistence

More on: Prompt Craft

View skill →

Build Hour: Prompt Caching

Build Hour: Prompt Caching

Advanced Prompt Engineering Course

Advanced Prompt Engineering Course

Organizing Your AI Prompts with Jinja Templates with ChatGPT & OpenAI

Organizing Your AI Prompts with Jinja Templates with ChatGPT & OpenAI

Automata Learning Lab

Creating a Game Prototype with Amazon Q and Amazon Bedrock (Prompt Engineering on AWS)

Creating a Game Prototype with Amazon Q and Amazon Bedrock (Prompt Engineering on AWS)

Switch from ChatGPT to Claude in 5 Minutes (Without Losing Your Memory)

Switch from ChatGPT to Claude in 5 Minutes (Without Losing Your Memory)

Create End to End AI Chatbot using Lovable.dev in 5 Mins!

Create End to End AI Chatbot using Lovable.dev in 5 Mins!

Related AI Lessons

The missing layer in prompt engineering: thinking quality

Learn to improve prompt engineering by focusing on thinking quality, a crucial missing layer in current practices

Dev.to · Julien Avezou

The Complete Guide to Prompt Engineering: Unlock the Full Potential of AI

Learn to craft effective prompts to unlock the full potential of AI and achieve game-changing outputs

Medium · ChatGPT

Structuring Prompt Guide: Reusable Templates That Actually Work

Learn to structure effective prompts for AI using reusable templates and improve your prompt engineering skills

Medium · JavaScript

Prompt Engineering Room Walkthrough Notes | TryHackMe

Learn the basics of prompt engineering and its techniques to improve your AI interactions

Medium · Cybersecurity

7 Tips for Writing Great Content with ChatGPT or Gemini | Chima Mmeje | Whiteboard Friday 4k