Summarization Middleware (Python)

LangChain · Intermediate ·🤖 AI Agents & Automation ·7mo ago

Key Takeaways

LangChain's summarization middleware is a key component of context engineering pipelines, automatically triggered to prevent context overflow issues in long-running agentic applications. The middleware is customizable with knobs such as model, context size, and retention policy.

Full Transcript

Hey folks, it's Sydney from Linkchain and I'm super excited to be back with another episode in our Python middleware series. This time we're going to be covering our summarization middleware. So context engineering is all the rage these days. But what does that actually mean? Context engineering is giving your model, which powers your agent, the right information and tools at the right time so that it can execute a given task. One of the most important tools that you can use to optimize your context for an agent is summarization. In particular, we're seeing agents run for longer and longer durations of time, which means that there is very long conversation histories with important information. But in order for your agent to perform well at the next step, it needs to have the right bits of information from the full conversation history. That's where summarization comes into play. With summarization, you can help your agent focus on the right information. One real world example of this that you might see in your day-to-day is that Claude Code autocompacts your conversation history when it gets too long. So, if you're asking Claude Code to help you with a pretty involved refactor, a couple minutes in, it might autocompact. You can do the same thing in just a couple lines of code with Langchain's new summarization middleware. All right, so we're looking at Lingchain's new middleware documentation where we can see the docs for the built-in summarization middleware. It's got a pretty simple interface here. So when you're creating an agent with this primitive, you generally specify model and tools and then you can pass in the summarization middleware with a couple of knobs that you turn. So first is the model that you use for summarization and then we have a context size that you trigger with that can either be specified in tokens, messages or proportion of the available context window. And then we also have a retention policy that's the context size that you want to keep. Let's jump over to Langchain's new API documentation to do a little bit more of a deep dive. So again, we can see those types here. I mentioned the context size. We also have things that you can customize like the summary prompt, whether or not you want to trim the context before it goes into the summary model. Um, and then also a detailed token counter. You can also see all of our other pre-built middlewares here and then some utilities for building custom middlewares. Let's jump into some code. So for this example, we are going to build an agent that can retrieve information from Wikipedia. We're using this nice retriever tool from Langchain community which we wrap in a fetch Wikipedia data tool. Then we use a custom summary prompt for our summarization middleware. Additionally, we configure a couple of those things like the model we want to use for summarization and then the fact that we want to trigger summarization when 70% of our context window size has been used and then our retention policy is that we're keeping 30%. You might be wondering, how does Langchain know what 70% of the context window size is for the GPT3.5 turbo model, for example? Well, we just released a new model profiles package that has information about model capabilities that we use under the hood to intuitively make decisions about agent behavior. So, that contains information like tool calling capacity, structured output features, context window size, and much more. Let's test this out in the live debugger. So, for this demo, I'm going to fetch information on each of the founding fathers and their birthdays. I'm using this example because I anticipate that there is tons of content on the Wikipedia pages for people like George Washington, John Adams, Thomas Jefferson, etc. So, let's kick this first query off. We can see a successful tool call to the fetch Wikipedia data tool with George Washington here and then a very verbose tool result and then that summary message from the model. It looks like our before model summarization tooling is not yet triggered. Let's kick this off for John Adams next. And then finally Thomas Jefferson. Great. So we can see that the summarization middleware was kicked off. Here we see a summary was generated. And then we use the fetch Wikipedia data tool to get information on Thomas Jefferson. And here's that final response. All right. So let's look at the trace view for these as well. So we see turn one which was George Washington. Turn two which was John Adams. and then turn three which was Thomas Jefferson. And then we see the summarization middleware was kicked off. Here we can see our custom prompt with the messages to summarize and then the output of that summary which is then fed into the final model request with that Thomas Jefferson tool call as well. Thanks for joining me for a quick demo of Lingchain's new summarization middleware, which is particularly helpful if you are diving into the art of context engineering in order to optimize agent behavior. See you in the next one.

Original Description

Learn about how to use LangChain's summarization middleware as a key component of your context engineering pipeline. This middleware is automatically triggered and helps to keep your long running agentic applications running smoothly without facing context overflow issues. Middleware docs: https://docs.langchain.com/oss/python/langchain/middleware/built-in#summarization Code: https://gist.github.com/sydney-runkle/81ebecbc7c563b506ade810b26aa0b8c Learn how to build agents with LangChain on LangChain Academy: https://academy.langchain.com/collections/quickstart/?utm_medium=social&utm_source=youtube&utm_campaign=q4-2025_youtube-academy-links_aw Observe, evaluate, and deploy agents with LangSmith: https://smith.langchain.com/?utm_medium=social&utm_source=youtube&utm_campaign=q4-2025_youtube-links_aw
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from LangChain · LangChain · 0 of 60

← Previous Next →
1 Chat With Your Documents Using LangChain + JavaScript
Chat With Your Documents Using LangChain + JavaScript
LangChain
2 LangChain SQL Webinar
LangChain SQL Webinar
LangChain
3 LangChain "OpenAI functions" Webinar
LangChain "OpenAI functions" Webinar
LangChain
4 LangSmith Launch
LangSmith Launch
LangChain
5 LangChain x Pinecone: Supercharging Llama-2 with RAG
LangChain x Pinecone: Supercharging Llama-2 with RAG
LangChain
6 LangChain Expression Language
LangChain Expression Language
LangChain
7 Building LLM applications with LangChain with Lance
Building LLM applications with LangChain with Lance
LangChain
8 Benchmarking Question/Answering Over CSV Data
Benchmarking Question/Answering Over CSV Data
LangChain
9 LangChain "RAG Evaluation" Webinar
LangChain "RAG Evaluation" Webinar
LangChain
10 Fine-tuning in Your Voice Webinar
Fine-tuning in Your Voice Webinar
LangChain
11 Tabular Data Retrieval
Tabular Data Retrieval
LangChain
12 Building an LLM Application with Audio by AssemblyAI
Building an LLM Application with Audio by AssemblyAI
LangChain
13 Superagent Deepdive Webinar
Superagent Deepdive Webinar
LangChain
14 Lessons from Deploying LLMs with LangSmith
Lessons from Deploying LLMs with LangSmith
LangChain
15 Shortwave Assistant Deepdive Webinar
Shortwave Assistant Deepdive Webinar
LangChain
16 Cognitive Architectures for Language Agents
Cognitive Architectures for Language Agents
LangChain
17 Effectively Building with LLMs in the Browser with Jacob
Effectively Building with LLMs in the Browser with Jacob
LangChain
18 Data Privacy for LLMs
Data Privacy for LLMs
LangChain
19 "Theory of Mind" Webinar with Plastic Labs
"Theory of Mind" Webinar with Plastic Labs
LangChain
20 LangChain Templates
LangChain Templates
LangChain
21 Using Natural Language to Query Postgres with Jacob
Using Natural Language to Query Postgres with Jacob
LangChain
22 Building a Research Assistant from Scratch
Building a Research Assistant from Scratch
LangChain
23 Benchmarking RAG over LangChain Docs
Benchmarking RAG over LangChain Docs
LangChain
24 Skeleton-of-Thought: Building a New Template from Scratch
Skeleton-of-Thought: Building a New Template from Scratch
LangChain
25 Benchmarking Methods for Semi-Structured RAG
Benchmarking Methods for Semi-Structured RAG
LangChain
26 LangSmith Highlights: Getting Started
LangSmith Highlights: Getting Started
LangChain
27 LangSmith Highlights: Debugging
LangSmith Highlights: Debugging
LangChain
28 LangSmith Highlights: Datasets
LangSmith Highlights: Datasets
LangChain
29 LangSmith Highlights: Evaluation
LangSmith Highlights: Evaluation
LangChain
30 LangSmith Highlights: Human Annotation
LangSmith Highlights: Human Annotation
LangChain
31 LangSmith Highlights: Monitoring
LangSmith Highlights: Monitoring
LangChain
32 LangSmith Highlights: Hub
LangSmith Highlights: Hub
LangChain
33 SQL Research Assistant
SQL Research Assistant
LangChain
34 Getting Started with Multi-Modal LLMs
Getting Started with Multi-Modal LLMs
LangChain
35 Build a Full Stack RAG App With TypeScript
Build a Full Stack RAG App With TypeScript
LangChain
36 Auto-Prompt Builder (with Hosted LangServe)
Auto-Prompt Builder (with Hosted LangServe)
LangChain
37 LangChain v0.1.0 Launch: Introduction
LangChain v0.1.0 Launch: Introduction
LangChain
38 LangChain v0.1.0 Launch: Observability
LangChain v0.1.0 Launch: Observability
LangChain
39 LangChain v0.1.0 Launch: Integrations
LangChain v0.1.0 Launch: Integrations
LangChain
40 LangChain v0.1.0 Launch: Composability
LangChain v0.1.0 Launch: Composability
LangChain
41 LangChain v0.1.0 Launch: Streaming
LangChain v0.1.0 Launch: Streaming
LangChain
42 LangChain v0.1.0 Launch: Output Parsing
LangChain v0.1.0 Launch: Output Parsing
LangChain
43 LangChain v0.1.0 Launch: Retrieval
LangChain v0.1.0 Launch: Retrieval
LangChain
44 LangChain v0.1.0 Launch: Agents
LangChain v0.1.0 Launch: Agents
LangChain
45 Build and Deploy a RAG app with Pinecone Serverless
Build and Deploy a RAG app with Pinecone Serverless
LangChain
46 Hosted LangServe + LangChain Templates
Hosted LangServe + LangChain Templates
LangChain
47 LangGraph: Intro
LangGraph: Intro
LangChain
48 LangGraph: Agent Executor
LangGraph: Agent Executor
LangChain
49 LangGraph: Chat Agent Executor
LangGraph: Chat Agent Executor
LangChain
50 LangGraph: Human-in-the-Loop
LangGraph: Human-in-the-Loop
LangChain
51 LangGraph: Dynamically Returning a Tool Output Directly
LangGraph: Dynamically Returning a Tool Output Directly
LangChain
52 LangGraph: Respond in a Specific Format
LangGraph: Respond in a Specific Format
LangChain
53 LangGraph: Managing Agent Steps
LangGraph: Managing Agent Steps
LangChain
54 LangGraph: Force-Calling a Tool
LangGraph: Force-Calling a Tool
LangChain
55 LangGraph: Multi-Agent Workflows
LangGraph: Multi-Agent Workflows
LangChain
56 Streaming Events: Introducing a new `stream_events` method
Streaming Events: Introducing a new `stream_events` method
LangChain
57 Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve
Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve
LangChain
58 OpenGPTs
OpenGPTs
LangChain
59 Open Source RAG with Nomic's New Embedding Model (and ChromaDB and Ollama)
Open Source RAG with Nomic's New Embedding Model (and ChromaDB and Ollama)
LangChain
60 LangGraph: Persistence
LangGraph: Persistence
LangChain

LangChain's summarization middleware helps optimize agent behavior by automatically triggering summarization to prevent context overflow issues. The middleware is customizable and can be used in Python applications.

Key Takeaways
  1. Import necessary libraries
  2. Create an agent with the summarization middleware
  3. Configure middleware knobs such as model and context size
  4. Test the agent in a live debugger
  5. Verify the summarization middleware is triggered correctly
💡 The summarization middleware can be customized with knobs such as model, context size, and retention policy to optimize agent behavior.

Related AI Lessons

My agent kept reading data it wasn't allowed to. The prompt was never going to stop it.
Learn how to secure autonomous agents with proper credential management to prevent unauthorized data access
Dev.to AI
8 Must-Know AI Chatbot Tools That Actually Help Small Businesses
Discover 8 essential AI chatbot tools that can genuinely benefit small businesses, and learn how to choose the right one for your specific use case
Dev.to AI
Agent-Ready Commerce, Part 9: Evidence and Audit Are Part of the Product
Learn how to design agent-ready commerce platforms that provide evidence and audit trails for their decisions, enabling transparency and trust.
Dev.to AI
Agent-Ready Commerce, Part 8: Generated Claims Need Review, Evidence, and Expiry
Learn to review and validate generated commerce text to ensure accuracy and safety
Dev.to AI
Up next
Building Great Agent Skills: The Missing Manual
AI Engineer
Watch →