Evaluating Deep Agents using LangSmith on AWS

📰 AWS Machine Learning

Learn to evaluate deep agents using LangSmith on AWS with a practical guide

intermediate Published 28 May 2026

Action Steps

Apply five evaluation patterns for deep agents to assess their performance
Build offline evaluations using pytest and LangSmith for thorough testing
Configure online monitoring for production to track agent performance in real-time
Use LangSmith on AWS to streamline the evaluation process for deep agents
Deploy a text-to-SQL deep agent with Amazon Bedrock for a full development to production lifecycle

Who Needs to Know This

Machine learning engineers and developers on a team can benefit from this guide to evaluate and improve their deep agents, ensuring reliable and efficient production deployments

Key Insight

💡 Evaluating deep agents is crucial for reliable production deployments, and using LangSmith on AWS can simplify the process

Key Takeaways

Learn to evaluate deep agents using LangSmith on AWS with a practical guide

Full Article

This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. In this post, you will learn how to: 1) apply five evaluation patterns for deep agents, 2) build offline evaluations using pytest and LangSmith, and 3) configure online monitoring for production. The walkthrough uses a text-to-SQL deep agent with Amazon Bedrock for the full development to production lifecycle.

Read full article → ← Back to Reads