Summarization Middleware (Python)

LangChain · Intermediate ·🤖 AI Agents & Automation ·7mo ago

Skills: Tool Use & Function Calling90%Agent Foundations80%

Key Takeaways

LangChain's summarization middleware is a key component of context engineering pipelines, automatically triggered to prevent context overflow issues in long-running agentic applications. The middleware is customizable with knobs such as model, context size, and retention policy.

Full Transcript

Hey folks, it's Sydney from Linkchain and I'm super excited to be back with another episode in our Python middleware series. This time we're going to be covering our summarization middleware. So context engineering is all the rage these days. But what does that actually mean? Context engineering is giving your model, which powers your agent, the right information and tools at the right time so that it can execute a given task. One of the most important tools that you can use to optimize your context for an agent is summarization. In particular, we're seeing agents run for longer and longer durations of time, which means that there is very long conversation histories with important information. But in order for your agent to perform well at the next step, it needs to have the right bits of information from the full conversation history. That's where summarization comes into play. With summarization, you can help your agent focus on the right information. One real world example of this that you might see in your day-to-day is that Claude Code autocompacts your conversation history when it gets too long. So, if you're asking Claude Code to help you with a pretty involved refactor, a couple minutes in, it might autocompact. You can do the same thing in just a couple lines of code with Langchain's new summarization middleware. All right, so we're looking at Lingchain's new middleware documentation where we can see the docs for the built-in summarization middleware. It's got a pretty simple interface here. So when you're creating an agent with this primitive, you generally specify model and tools and then you can pass in the summarization middleware with a couple of knobs that you turn. So first is the model that you use for summarization and then we have a context size that you trigger with that can either be specified in tokens, messages or proportion of the available context window. And then we also have a retention policy that's the context size that you want to keep. Let's jump over to Langchain's new API documentation to do a little bit more of a deep dive. So again, we can see those types here. I mentioned the context size. We also have things that you can customize like the summary prompt, whether or not you want to trim the context before it goes into the summary model. Um, and then also a detailed token counter. You can also see all of our other pre-built middlewares here and then some utilities for building custom middlewares. Let's jump into some code. So for this example, we are going to build an agent that can retrieve information from Wikipedia. We're using this nice retriever tool from Langchain community which we wrap in a fetch Wikipedia data tool. Then we use a custom summary prompt for our summarization middleware. Additionally, we configure a couple of those things like the model we want to use for summarization and then the fact that we want to trigger summarization when 70% of our context window size has been used and then our retention policy is that we're keeping 30%. You might be wondering, how does Langchain know what 70% of the context window size is for the GPT3.5 turbo model, for example? Well, we just released a new model profiles package that has information about model capabilities that we use under the hood to intuitively make decisions about agent behavior. So, that contains information like tool calling capacity, structured output features, context window size, and much more. Let's test this out in the live debugger. So, for this demo, I'm going to fetch information on each of the founding fathers and their birthdays. I'm using this example because I anticipate that there is tons of content on the Wikipedia pages for people like George Washington, John Adams, Thomas Jefferson, etc. So, let's kick this first query off. We can see a successful tool call to the fetch Wikipedia data tool with George Washington here and then a very verbose tool result and then that summary message from the model. It looks like our before model summarization tooling is not yet triggered. Let's kick this off for John Adams next. And then finally Thomas Jefferson. Great. So we can see that the summarization middleware was kicked off. Here we see a summary was generated. And then we use the fetch Wikipedia data tool to get information on Thomas Jefferson. And here's that final response. All right. So let's look at the trace view for these as well. So we see turn one which was George Washington. Turn two which was John Adams. and then turn three which was Thomas Jefferson. And then we see the summarization middleware was kicked off. Here we can see our custom prompt with the messages to summarize and then the output of that summary which is then fed into the final model request with that Thomas Jefferson tool call as well. Thanks for joining me for a quick demo of Lingchain's new summarization middleware, which is particularly helpful if you are diving into the art of context engineering in order to optimize agent behavior. See you in the next one.

Original Description

Learn about how to use LangChain's summarization middleware as a key component of your context engineering pipeline. This middleware is automatically triggered and helps to keep your long running agentic applications running smoothly without facing context overflow issues. Middleware docs: https://docs.langchain.com/oss/python/langchain/middleware/built-in#summarization Code: https://gist.github.com/sydney-runkle/81ebecbc7c563b506ade810b26aa0b8c Learn how to build agents with LangChain on LangChain Academy: https://academy.langchain.com/collections/quickstart/?utm_medium=social&utm_source=youtube&utm_campaign=q4-2025_youtube-academy-links_aw Observe, evaluate, and deploy agents with LangSmith: https://smith.langchain.com/?utm_medium=social&utm_source=youtube&utm_campaign=q4-2025_youtube-links_aw

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from LangChain · LangChain · 0 of 60

← Previous Next →

Chat With Your Documents Using LangChain + JavaScript

Chat With Your Documents Using LangChain + JavaScript

LangChain SQL Webinar

LangChain SQL Webinar

LangChain "OpenAI functions" Webinar

LangChain "OpenAI functions" Webinar

LangSmith Launch

LangSmith Launch

LangChain x Pinecone: Supercharging Llama-2 with RAG

LangChain x Pinecone: Supercharging Llama-2 with RAG

LangChain Expression Language

LangChain Expression Language

Building LLM applications with LangChain with Lance

Building LLM applications with LangChain with Lance

Benchmarking Question/Answering Over CSV Data

Benchmarking Question/Answering Over CSV Data

LangChain "RAG Evaluation" Webinar

LangChain "RAG Evaluation" Webinar

Fine-tuning in Your Voice Webinar

Fine-tuning in Your Voice Webinar

Tabular Data Retrieval

Tabular Data Retrieval

Building an LLM Application with Audio by AssemblyAI

Building an LLM Application with Audio by AssemblyAI

Superagent Deepdive Webinar

Superagent Deepdive Webinar

Lessons from Deploying LLMs with LangSmith

Lessons from Deploying LLMs with LangSmith

Shortwave Assistant Deepdive Webinar

Shortwave Assistant Deepdive Webinar

Cognitive Architectures for Language Agents

Cognitive Architectures for Language Agents

Effectively Building with LLMs in the Browser with Jacob

Effectively Building with LLMs in the Browser with Jacob

Data Privacy for LLMs

Data Privacy for LLMs

"Theory of Mind" Webinar with Plastic Labs

"Theory of Mind" Webinar with Plastic Labs

LangChain Templates

LangChain Templates

Using Natural Language to Query Postgres with Jacob

Using Natural Language to Query Postgres with Jacob

Building a Research Assistant from Scratch

Building a Research Assistant from Scratch

Benchmarking RAG over LangChain Docs

Benchmarking RAG over LangChain Docs

Skeleton-of-Thought: Building a New Template from Scratch

Skeleton-of-Thought: Building a New Template from Scratch

Benchmarking Methods for Semi-Structured RAG

Benchmarking Methods for Semi-Structured RAG

LangSmith Highlights: Getting Started

LangSmith Highlights: Getting Started

LangSmith Highlights: Debugging

LangSmith Highlights: Debugging

LangSmith Highlights: Datasets

LangSmith Highlights: Datasets

LangSmith Highlights: Evaluation

LangSmith Highlights: Evaluation

LangSmith Highlights: Human Annotation

LangSmith Highlights: Human Annotation

LangSmith Highlights: Monitoring

LangSmith Highlights: Monitoring

LangSmith Highlights: Hub

LangSmith Highlights: Hub

SQL Research Assistant

SQL Research Assistant

Getting Started with Multi-Modal LLMs

Getting Started with Multi-Modal LLMs

Build a Full Stack RAG App With TypeScript

Build a Full Stack RAG App With TypeScript

Auto-Prompt Builder (with Hosted LangServe)

Auto-Prompt Builder (with Hosted LangServe)

LangChain v0.1.0 Launch: Introduction

LangChain v0.1.0 Launch: Introduction

LangChain v0.1.0 Launch: Observability

LangChain v0.1.0 Launch: Observability

LangChain v0.1.0 Launch: Integrations

LangChain v0.1.0 Launch: Integrations

LangChain v0.1.0 Launch: Composability

LangChain v0.1.0 Launch: Composability

LangChain v0.1.0 Launch: Streaming

LangChain v0.1.0 Launch: Streaming

LangChain v0.1.0 Launch: Output Parsing

LangChain v0.1.0 Launch: Output Parsing

LangChain v0.1.0 Launch: Retrieval

LangChain v0.1.0 Launch: Retrieval

LangChain v0.1.0 Launch: Agents

LangChain v0.1.0 Launch: Agents

Build and Deploy a RAG app with Pinecone Serverless

Build and Deploy a RAG app with Pinecone Serverless

Hosted LangServe + LangChain Templates

Hosted LangServe + LangChain Templates

LangGraph: Intro

LangGraph: Intro

LangGraph: Agent Executor

LangGraph: Agent Executor

LangGraph: Chat Agent Executor

LangGraph: Chat Agent Executor

LangGraph: Human-in-the-Loop

LangGraph: Human-in-the-Loop

LangGraph: Dynamically Returning a Tool Output Directly

LangGraph: Dynamically Returning a Tool Output Directly

LangGraph: Respond in a Specific Format

LangGraph: Respond in a Specific Format

LangGraph: Managing Agent Steps

LangGraph: Managing Agent Steps

LangGraph: Force-Calling a Tool

LangGraph: Force-Calling a Tool

LangGraph: Multi-Agent Workflows

LangGraph: Multi-Agent Workflows

Streaming Events: Introducing a new `stream_events` method

Streaming Events: Introducing a new `stream_events` method

Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve

Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve

Open Source RAG with Nomic's New Embedding Model (and ChromaDB and Ollama)

Open Source RAG with Nomic's New Embedding Model (and ChromaDB and Ollama)

LangGraph: Persistence

LangGraph: Persistence

LangChain's summarization middleware helps optimize agent behavior by automatically triggering summarization to prevent context overflow issues. The middleware is customizable and can be used in Python applications.

Key Takeaways

Import necessary libraries
Create an agent with the summarization middleware
Configure middleware knobs such as model and context size
Test the agent in a live debugger
Verify the summarization middleware is triggered correctly

💡 The summarization middleware can be customized with knobs such as model, context size, and retention policy to optimize agent behavior.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Tool Use & Function Calling

View skill →

Adding a Phone Gateway to a Virtual Agent

Administering an AlloyDB Database

Cloud Storage: Qwik Start - CLI/SDK

Cloud Composer: Copying BigQuery Tables Across Different Locations

Getting started with Firebase Cloud Firestore

Getting Started with Liquid to Customize the Looker User Experience

Related AI Lessons

My agent kept reading data it wasn't allowed to. The prompt was never going to stop it.

Learn how to secure autonomous agents with proper credential management to prevent unauthorized data access

8 Must-Know AI Chatbot Tools That Actually Help Small Businesses

Discover 8 essential AI chatbot tools that can genuinely benefit small businesses, and learn how to choose the right one for your specific use case

Agent-Ready Commerce, Part 9: Evidence and Audit Are Part of the Product

Learn how to design agent-ready commerce platforms that provide evidence and audit trails for their decisions, enabling transparency and trust.

Agent-Ready Commerce, Part 8: Generated Claims Need Review, Evidence, and Expiry

Learn to review and validate generated commerce text to ensure accuracy and safety

Building Great Agent Skills: The Missing Manual