Tips for building AI agents

Anthropic · Intermediate ·🧠 Large Language Models ·1y ago

Skills: Agent Foundations90%LLM Foundations80%Tool Use & Function Calling70%

Key Takeaways

Anthropic's experts discuss building effective AI agents, covering topics such as agent foundations, autonomous systems, and LLM engineering, with a focus on prompt engineering, tool use, and function calling. They highlight the importance of understanding model limitations, documentation, and clear instructions in agent design, as well as the potential of agents for tasks like search, coding, and automation.

Full Transcript

I feel like agents for consumers are like fairly overh right now Goot um trying to have a agent like fully book a vacation for you almost just as hard as just going and booking it yourself take one today we're going behind the scenes on one of our recent blog posts building effective agents I'm Alex I lead Claude relations here at anthropic I'm Eric I'm on the research team at anthropic I'm Barry I'm on the applied AI team I'm going to kick us off here for viewers just jumping in uh what's the quick version of what an agent actually is I mean there's a million definitions of it and why should a developer or somebody that's actually building with AI care about these things um Eric maybe we can start with you sure yeah so I think something we explored in the blog post is that first of all a lot of people have been saying everything is an agent referring to almost anything more than just a single llm call um one of the things we tried to do in the blog post is really kind of separate this out of like hey there's workflows which is where you have a few llm calls chain together and really what we think an agent is is where you're letting the llm decide sort of how many times to run you're having it continuing to Loop until it's found a resolution uh and that could be you know talking to a customer for customer support that could be iterating on code changes but something where like you don't know how many steps it's going to take to complete that's really sort of what we consider an agent interesting so in the definition of an agent we are letting the llm kind of pick its own fate and decide what it wants to do what actions to take instead of us predefining a path for it exactly it's more autonomous whereas a workflow you can kind of think of it as like um uh you know a yeah a workflow or sort of like it's on Rails through a fixed number of steps I see so this distinction I assume this was the result of many many conversations with customers and working with different teams and even trying things ourself Barry can you speak more to maybe what that looks like as we got to create this divide between a workflow and and what sort of patterns surprised you the most as you were going through this sure um honestly I think all of this kind of evolved as like model got better and like teams got more sophisticated uh we both work with a large number of like customers were very sophisticated and we kind of went from like having a single LM to having a lot of L and like eventually having like LM orchestrating themselves um so you know like one of the reasons why we decided to create this distinction is because we start to see these two distinct patterns where you have workflows that's pre- orchestrated by code and then you have you know agent which is like a simpler but you know complex in in other sense U like different shape that that that we're starting to see um really I think like as the models and all of the tools start to get better um you know agents are becoming more and more prevalent and more and more capable and that's when we decided like hey this is probably a good time for us to to like get give a formal definition so in practice if you're a developer implementing one of these things what would that actually look like in your your code as you're starting to build this like the differences between uh like maybe we actually go down to like the prompt level here what what does an agent prompt look like or flow and what does like a workflow look like yeah so I think a workflow prompt looks like you have one prompt uh you take the output of it you feed it into prompt B take the output of that feed it into prompt C and then you're done kind of there's this straight line fixed number of steps you know exactly what's going to happen and maybe you have some extra code that sort of checks the intermediate results of these and makes sure okay um but you kind of know exactly what's going to happen in one of these paths um and each of those prompts is um is sort of a very specific prompt just sort of taking one input and transforming it into another output for instance uh maybe one of these prompts is taking in the user question and uh categorizing it into one of five categories so that then the next prompt can be more specific for that okay in contrast an agent prompt will be sort of much more open-ended and usually give the model tools or multiple things to check and say hey here's the question and you can do web searches or you can edit these code files or run code and keep doing this until you have the answer I see so that's a few different use cases there um that makes sense as we start to arrive at like these different conclusions I'm curious as we've now kind of covered at a high level how we're thinking about these these workflows and agents and talking about the blog post I want to dive even further behind the scenes were there any funny stories Barry of like wild things that you saw from customers that were interesting or are just like kind of far out there in terms of how people are starting to actually use these things in production yeah um this is actually from my own experience like uh buing agents uh I when I I joined like about a month before the uh Sona V2 refresh and one of my onboarding tasks was to like run OS World which was a computer us Benchmark and uh for like a whole week me and this other engineer were just staring at this like agent trajectories that were like counterintuitive to us uh and then well like you know we we weren't sure like why the model was making decision you know it was uh given the instructions that we give it um so we decided like we're going to act like cloud and you know put ourselves in that environment so we would do this really silly thing where we close our eyes for like a whole minute and then we like blink at a screen for a second and we close our eyes again just think like well I have to write python code to operate in this environment what would I do and suddenly it made a lot more sense and I feel like a lot of like agent design comes down to that it's like there's a lot of context and a lot of knowledge that the model maybe not like does not have and we have to be empathetic to like the model and we have to like make a lot of that clear in the prompt in the two description and in the environment I see so a tip here for developers is almost like to act as if you are looking through the lens of the model itself in terms of like okay what would be the most applicable instructions here how is the model like seeing the world which is very different than how like we operate as a human I guess with additional context um Eric I'm curious if you have any other stories that you've seen yeah I think actually my like uh in a very similar vein I think a lot of people really forget to uh forget to do this and I think maybe the funniest things I see is that people will put a lot of effort into creating these really beautiful detailed prompts um and then the tools that they make to give the model are sort of these incredibly Bare Bones uh like you know no documentation fun like the parameters were named A and B and it's kind of like oh like an an engineer wouldn't be able to like you know work with this as a um you know work with this as if this was a function they had to use because there's no documentation like how can Claude expect how can you expect claw to use this as well so kind of it's like that lack of like putting yourself in the model shoes and I think I think a lot of people when they start trying to use tool use and function calling um they kind of forget that they have to prompt as well and they they think about the model just as this as a you know a more classical programming system but it is still a model and you need to be prompt Engineering in the descriptions of your tools themselves yeah that's yeah I've noticed that it's like people forget that it's all part of the same prompt like it's all getting fed into the same prompt in the context window and writing a good tool description influences other parts of the prompt as well so that is like one aspect to consider um agents is this kind of all the hype term right now a lot of lot of people are talking about it and there's been plenty of Articles written and videos made on the subject what made you guys think that now is the right time to write something ourselves and uh talk a little bit more about like the details of Agents sure yeah uh I think one of the like you know most important things for us is to be able to explain things well I think that's like a big part of our like motivation which is like we walk into customer meetings and everything is referred to as a you know different term even though they share the same shape so we thought like you know it would be really useful if we can just like have a set of definition and a set of like diagrams and code to explain these these things to our customers um and you know like we are getting to the point where like the model is capable of doing a lot of the like the agentic uh workflows that that we're we're seeing and that seems like you know the the right time for us to you know have some definitions so just to make these conversations easier MH yeah I think for me I saw that there was a lot of excitement around agents but also a lot of people really didn't know what it meant in practice and so they were trying to bring agents to sort of any problem they had even when much simpler systems would work and so I I saw that as like one of the reasons that we should write this is like guide people about how to do agents but also like where agents are appropriate uh and that you shouldn't go after a fly with a bazooka I see I see that that was a perfect part L to my next question here there's a lot of talk about the potential of agents and every developer out there and every startup in business is trying to think about how they can build their own version of an agent for their company or product um but you guys are starting to see what actually works in production so we're going to play a little game here I want to know one thing that's overhyped about agents right now and also one thing that's underhyped just in terms of implementations or actual uh uses in production or like potentials here as well so Eric let's let's start with you first I feel like underhyped is like things that save people time even if it's a very small amount of time um I think a lot of times if you just look at that on the surface it's like oh this is something that takes me a minute and even if you can fully automate it it's only a minute like what what help is that but really that changes the Dynamics of now you can do that thing a hundred times more than you previously would so I think I'm like most excited about things that if they were easier could be really scaled up yeah I I don't know if this is like necessarily related to Hype but I think it's really difficult to calibrate right now like where agents are really needed I think there's like this intersection that's a sweet spot for using agent and that's like a set of tasks that's valuable and complex but also like maybe the cost of error or cost of monitoring error is relatively low um that set of tasks is like not like not not like super clear and obvious unless like you know we we actually like look into the like the existing processes I think like coding and search are like two pretty canonical examples where like agents are very useful uh like take search as an example right like you know um it it's it's a really valuable task it's very hard to do like deep iterative search but you can always trade off some like Precision for recall and then just get a little bit more documents or a little bit more information than than it's needed and filter it down so like we've seen a lot of success there with agent search what does a coding agent look like right now coding agents I think are are super exciting because they are verifiable at least partially um you know code has this great property that you can write tests for it and then you edit the code and either the tests pass or they don't pass uh now that assumes that you have good unit tests which I think you know every engineer in the world can say like we don't yeah um but at least it's better than than a lot of things you know there's no equivalent way to do that for many other fields um so this at least gives this gives a coding agent some way that it can get more signal every time it goes through a loop so you know if every time it's running the tests again it's seeing what the error of the output is that makes me think that you know the model can kind of converge on the right answer by getting this feedback and if you don't have some mechanism to get feedback as you're iterating you're not injecting any more signal you're just going to have noise uh and so there's there's no reason without something like this that an agent will converge to the right answer I see so what's what's the biggest blockers then in terms of improving agentic performance on like coding at the moment yeah so I think for for coding um you know we've seen over the last year like on S bench like results have gone really from like very very low to like uh I think you know over 50% now which is really incredible so the models are getting really good at writing code to solve these issues um I feel like I have a slightly controversial take here that I think the next limiting factor is going to come back to that verification that like it's it's great for these cases where we do have perfect unit tests and that's starting to work but for the real world cases we usually don't have perfect unit tests for them and so that's I'm thinking now like finding ways that we can verify and we can add tests for the things that you really care about so that the model itself can test this and know whether it's right or wrong before it goes back to the human I see making sure that we can embed some sort of feedback loop into the processes it's exactly the right or wrong okay what's the future of Agents look like in 2025 Barry we're gonna start with you yeah uh that I think that's a really difficult question um this is like probably not like a practical thing um but one thing i' I've been really interested in just like how like a multi-agent environment will look like I think I've already shown Eric this like I buil an environment where like a bunch of cloud can like spin up other clouds and play like werewolf together and it's like a completely what is werewolf werewolf is a social deduction game where um all all of the players are trying to figure out what each other's row is uh it's very similar to Mafia it's entirely Tech space which is great for for cloud to play in I see so we have clouds multiple different clouds playing different roles within this game all communicating with each other yeah exactly and then you see a lot of like interesting interaction in there that you just haven't like seen before and that's something I'm really excited about as like you know uh very similar to how we went from like single LM to multi LM I think by the end of the year we can potentially see us going from like agent to multi-agent and there are some like I think interesting research questions to like figure out in that domain in terms of how the agents interact with each other what does like this kind of emergent Behavior look like in that front as you coordinate between agents doing different things exactly and just like whether this is actually like going to be useful or better than like a single agent with access to a lot more resources do we see any multi-agent approaches right now that are actually working in production um I I feel like in production we we haven't even seen a lot of like successful single agent okay interesting um but like you know this is kind of like a potential extension of like successful agents with the um I guess like uh improved capabilities of the next couple of generations models MH um yeah so this is not advice that everyone should go explore multiagent environment uh it's just I think like you know to understand the models behav like this provides us with a better way to understand model behaviors I see okay Eric what's what's the future of agents in 2025 I feel like in 2025 we're going to see a lot of business adoption of Agents um starting to automate a lot of repetitive tasks and really like scale up a lot of things that people wanted to do more of before but were too expensive you could now have 10x or 100x how much you do of these things um I'm imagining things like you know every single pull request in triggers a coding agent to come and update all of your documentation things like that would be cost prohibitive to do before but once you think of Agents is sort of almost free you can start doing these you know adding these bells and whistles everywhere I think maybe something that's not going to happen yet going back to like what's overhyped yeah I feel like agents for consumers are like fairly overhyped right now we Goot um because I think that like like we talked about like uh verifiability I think that for a lot of consumer tasks um it's almost as much work to sort of fully specify your preferences and what the task is as to just do it yourself and it's very expensive to verify so like trying to have a agent like fully book a vacation for you describing exactly what you want your vacation to be and your preferences is like almost just as hard as just going and booking it yourself interesting and you it's like very high risk you don't want the agent to go like actually go book a plane flight interes without you first accepting it is there a matter of maybe context that we're missing here too from like the models being able to infer this information about somebody without having to explicitly go ask and learn the preference over time yeah so I I think that these things will get there but first you need to build up this context so that the model already knows your preferences and things and I think that takes time I see and we'll need some stepping stones to get to bigger tasks like planning a whole vacation I see okay very interesting last question any any advice that you give to a developer that's exploring this right now in terms of starting to build this or just thinking about it from a general future proofing perspective that you can give I feel like my best advice is like make sure that you have a way to measure your results um because I've seen a lot of people will go and and sort of build in a vacuum without any way to get feedback about whether their building is working or not um and you can end up building a lot sort of without realizing that it's uh either it's not working or maybe something much simpler have actually done just as good a job yeah I I think very similarly like you know starting as simple as possible um and like having that measurable result as you are you know like building more complexity into it uh one thing I've been really impressed by is like there I work with some really resourceful startups and they just like they can do everything within one LM call and the orchestration around like the the code which will like persist even as the model gets better um is like kind of their their Niche and uh I I always like get very happy when I see see one of those because I think they can reap the benefit of like future capability improvements right um and yeah I think like like realistically you know uh we don't know what use case would be great for for agents and like the landscape is going to shift but um it's probably a good time to like start building up some of that like muscle to like think in the agent land um just to understand that capability a little bit better yeah I think I want to double click on something you said of like being excited for the models to get better I think that if you if you look at your startup or your product and think oh man if the models get smarter all of our Mo's going to disappear that means you're building the wrong thing instead you should be building something so that as the models get smarter your product it's better and better right that's great advice um Eric Barry thank you guys uh this is building effective agents thank you thanks [Music]

Original Description

Anthropic’s Barry Zhang (Applied AI), Erik Schluntz (Research), and Alex Albert (Claude Relations) discuss the potential of AI agents, common pitfalls to avoid, and how to prepare for the evolving landscape. Read more advice on building agents: https://www.anthropic.com/research/building-effective-agents 00:00 Introduction 00:26 Defining AI agents and workflows 02:55 Anatomy of an agent prompt 04:29 Behind the scenes stories 07:29 Why write about agents now 08:53 Overhyped and underhyped aspects of agents 09:57 Identifying useful applications of agents 10:47 Coding agents: Potential and challenges 12:47 The future of agents in 2025 16:26 Advice for developers exploring agents

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Playlist UUrDwWp7EBBv4NwvScIpBDOA · Anthropic · 44 of 60

← Previous Next →

Quick tips for Claude: Long context file uploads

Quick tips for Claude: Long context file uploads

Inside our first Anthropic Hackathon, San Francisco

Inside our first Anthropic Hackathon, San Francisco

Long inputs, multi-step output with Claude

Long inputs, multi-step output with Claude

Coding with Claude

Coding with Claude

Behind the prompt: Prompting tips for Claude.ai

Behind the prompt: Prompting tips for Claude.ai

Robin AI, powered by Claude

Robin AI, powered by Claude

Claude 3 Opus as an economic analyst

Claude 3 Opus as an economic analyst

Claude 3 Sonnet as a language learning partner

Claude 3 Sonnet as a language learning partner

Claude 3 Haiku turns thousands of physical documents into structured data

Claude 3 Haiku turns thousands of physical documents into structured data

Claude 3 Haiku for instant customer service

Claude 3 Haiku for instant customer service

Claude 3 Haiku for fast document analysis

Claude 3 Haiku for fast document analysis

Tool use with the Claude 3 model family

Tool use with the Claude 3 model family

Coming soon to the Team plan on Claude.ai

Coming soon to the Team plan on Claude.ai

Introducing the Claude iOS app

Introducing the Claude iOS app

Claude is now available in Europe

Claude is now available in Europe

What is interpretability?

What is interpretability?

What should an AI's personality be?

What should an AI's personality be?

Scaling interpretability

Scaling interpretability

Claude 3.5 Sonnet for sparking creativity

Claude 3.5 Sonnet for sparking creativity

Claude 3.5 Sonnet for vision

Claude 3.5 Sonnet for vision

Claude 3.5 Sonnet as a writing partner

Claude 3.5 Sonnet as a writing partner

Claude 3.5 Sonnet for agentic coding

Claude 3.5 Sonnet for agentic coding

Shareable Projects in Claude

Shareable Projects in Claude

Evaluate prompts in the Anthropic Console

Evaluate prompts in the Anthropic Console

Shareable Artifacts in Claude

Shareable Artifacts in Claude

How we built Artifacts with Claude

How we built Artifacts with Claude

Wedia advances digital asset management with Claude

Wedia advances digital asset management with Claude

AI prompt engineering: A deep dive

AI prompt engineering: A deep dive

AI Prompt Engineering 101: Explained

AI Prompt Engineering 101: Explained

Ancient Wisdom, Modern AI?

Ancient Wisdom, Modern AI?

AI's Greatest Challenge: You?

AI's Greatest Challenge: You?

AI Prompts That Drive Growth

AI Prompts That Drive Growth

Tips For Better Results With AI

Tips For Better Results With AI

AI, policy, and the weird sci-fi future with Anthropic’s Jack Clark

AI, policy, and the weird sci-fi future with Anthropic’s Jack Clark

European Parliament expands access to their archives with Claude in Amazon Bedrock

European Parliament expands access to their archives with Claude in Amazon Bedrock

Claude | Computer use for automating operations

Claude | Computer use for automating operations

Claude | Computer use for orchestrating tasks

Claude | Computer use for orchestrating tasks

Claude | Computer use for coding

Claude | Computer use for coding

Asana supercharges work management with Claude

Asana supercharges work management with Claude

What do people use AI models for?

What do people use AI models for?

Alignment faking in large language models

Alignment faking in large language models

Building Anthropic | A conversation with our co-founders

Building Anthropic | A conversation with our co-founders

How difficult is AI alignment? | Anthropic Research Salon

How difficult is AI alignment? | Anthropic Research Salon

Tips for building AI agents

Tips for building AI agents

Claude 3.7 Sonnet with extended thinking

Claude 3.7 Sonnet with extended thinking

Introducing Claude Code

Introducing Claude Code

Advice For Building AI Agents

Advice For Building AI Agents

The Two Most Useful Applications of AI Agents

The Two Most Useful Applications of AI Agents

Defending against AI jailbreaks

Defending against AI jailbreaks

The Most Common Mistake People Make When Building AI Agents

The Most Common Mistake People Make When Building AI Agents

Controlling powerful AI

Controlling powerful AI

How Intercom is redefining customer support with Claude

How Intercom is redefining customer support with Claude

Tracing the thoughts of a large language model

Tracing the thoughts of a large language model

Introducing Claude for Education

Introducing Claude for Education

Could AI models be conscious?

Could AI models be conscious?

Lessons on AI agents from Claude Plays Pokemon

Lessons on AI agents from Claude Plays Pokemon

The Societal Impacts of AI

The Societal Impacts of AI

What Does AI Mean for the Future of Work?

What Does AI Mean for the Future of Work?

Understanding AI Agents...Through Pokémon

Understanding AI Agents...Through Pokémon

What Pokémon Teaches Us About Building With AI

What Pokémon Teaches Us About Building With AI

Building effective AI agents requires understanding agent foundations, autonomous systems, and LLM engineering, as well as prompt engineering, tool use, and function calling. Developers should consider the model's perspective and provide clear instructions and documentation. Agents have potential for tasks like search, coding, and automation, but also come with challenges like verifiability and emergent behavior.

Key Takeaways

Define the agent's objective and scope
Design the agent's architecture and components
Implement prompt engineering and tool use
Test and evaluate the agent's performance
Refine and iterate on the agent's design
Consider the model's limitations and potential biases
Implement feedback loops for verification and improvement
Monitor and analyze the agent's behavior and results

💡 Understanding the model's perspective and providing clear instructions and documentation is crucial for effective agent design and development.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Agent Foundations

View skill →

Build and Deploy an Agent with Reasoning Engine in Vertex AI

Adding a Phone Gateway to a Virtual Agent

From Zero to Working AI Agent in 60 Seconds

From Zero to Working AI Agent in 60 Seconds

Create An AI Agent With Replit That Automates Your Sales

Create An AI Agent With Replit That Automates Your Sales

Capstone: Autonomous Runway Detection for IoT

Capstone: Autonomous Runway Detection for IoT

AI Agents with Model Context Protocol & Typescript

AI Agents with Model Context Protocol & Typescript

Related AI Lessons

Claude AI vs ChatGPT: Which One Is Actually Better in 2026?

Compare Claude AI and ChatGPT based on real-world usage and benchmarking to determine which one is better in 2026

Claude AI vs ChatGPT: Which One Is Actually Better in 2026?

Compare Claude AI and ChatGPT to determine which AI model is better for your needs in 2026

Medium · Programming

IntelliBooks: Classic RAG vs Graph RAG vs Agentic RAG – Choosing the Right AI Retrieval Architecture for Enterprise AI

Learn to choose the right AI retrieval architecture for enterprise AI between Classic RAG, Graph RAG, and Agentic RAG

Fluid, natural voice translation with Gemini 3.5 Live Translate

Learn about Gemini 3.5 Live Translate, a new voice translation technology that enables fluid and natural conversations across languages

Chapters (10)

Introduction

0:26 Defining AI agents and workflows

2:55 Anatomy of an agent prompt

4:29 Behind the scenes stories

7:29 Why write about agents now

8:53 Overhyped and underhyped aspects of agents

9:57 Identifying useful applications of agents

10:47 Coding agents: Potential and challenges

12:47 The future of agents in 2025

16:26 Advice for developers exploring agents

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)