Summarization Middleware (Python)
Key Takeaways
LangChain's summarization middleware is a key component of context engineering pipelines, automatically triggered to prevent context overflow issues in long-running agentic applications. The middleware is customizable with knobs such as model, context size, and retention policy.
Full Transcript
Hey folks, it's Sydney from Linkchain and I'm super excited to be back with another episode in our Python middleware series. This time we're going to be covering our summarization middleware. So context engineering is all the rage these days. But what does that actually mean? Context engineering is giving your model, which powers your agent, the right information and tools at the right time so that it can execute a given task. One of the most important tools that you can use to optimize your context for an agent is summarization. In particular, we're seeing agents run for longer and longer durations of time, which means that there is very long conversation histories with important information. But in order for your agent to perform well at the next step, it needs to have the right bits of information from the full conversation history. That's where summarization comes into play. With summarization, you can help your agent focus on the right information. One real world example of this that you might see in your day-to-day is that Claude Code autocompacts your conversation history when it gets too long. So, if you're asking Claude Code to help you with a pretty involved refactor, a couple minutes in, it might autocompact. You can do the same thing in just a couple lines of code with Langchain's new summarization middleware. All right, so we're looking at Lingchain's new middleware documentation where we can see the docs for the built-in summarization middleware. It's got a pretty simple interface here. So when you're creating an agent with this primitive, you generally specify model and tools and then you can pass in the summarization middleware with a couple of knobs that you turn. So first is the model that you use for summarization and then we have a context size that you trigger with that can either be specified in tokens, messages or proportion of the available context window. And then we also have a retention policy that's the context size that you want to keep. Let's jump over to Langchain's new API documentation to do a little bit more of a deep dive. So again, we can see those types here. I mentioned the context size. We also have things that you can customize like the summary prompt, whether or not you want to trim the context before it goes into the summary model. Um, and then also a detailed token counter. You can also see all of our other pre-built middlewares here and then some utilities for building custom middlewares. Let's jump into some code. So for this example, we are going to build an agent that can retrieve information from Wikipedia. We're using this nice retriever tool from Langchain community which we wrap in a fetch Wikipedia data tool. Then we use a custom summary prompt for our summarization middleware. Additionally, we configure a couple of those things like the model we want to use for summarization and then the fact that we want to trigger summarization when 70% of our context window size has been used and then our retention policy is that we're keeping 30%. You might be wondering, how does Langchain know what 70% of the context window size is for the GPT3.5 turbo model, for example? Well, we just released a new model profiles package that has information about model capabilities that we use under the hood to intuitively make decisions about agent behavior. So, that contains information like tool calling capacity, structured output features, context window size, and much more. Let's test this out in the live debugger. So, for this demo, I'm going to fetch information on each of the founding fathers and their birthdays. I'm using this example because I anticipate that there is tons of content on the Wikipedia pages for people like George Washington, John Adams, Thomas Jefferson, etc. So, let's kick this first query off. We can see a successful tool call to the fetch Wikipedia data tool with George Washington here and then a very verbose tool result and then that summary message from the model. It looks like our before model summarization tooling is not yet triggered. Let's kick this off for John Adams next. And then finally Thomas Jefferson. Great. So we can see that the summarization middleware was kicked off. Here we see a summary was generated. And then we use the fetch Wikipedia data tool to get information on Thomas Jefferson. And here's that final response. All right. So let's look at the trace view for these as well. So we see turn one which was George Washington. Turn two which was John Adams. and then turn three which was Thomas Jefferson. And then we see the summarization middleware was kicked off. Here we can see our custom prompt with the messages to summarize and then the output of that summary which is then fed into the final model request with that Thomas Jefferson tool call as well. Thanks for joining me for a quick demo of Lingchain's new summarization middleware, which is particularly helpful if you are diving into the art of context engineering in order to optimize agent behavior. See you in the next one.
Original Description
Learn about how to use LangChain's summarization middleware as a key component of your context engineering pipeline. This middleware is automatically triggered and helps to keep your long running agentic applications running smoothly without facing context overflow issues.
Middleware docs: https://docs.langchain.com/oss/python/langchain/middleware/built-in#summarization
Code: https://gist.github.com/sydney-runkle/81ebecbc7c563b506ade810b26aa0b8c
Learn how to build agents with LangChain on LangChain Academy: https://academy.langchain.com/collections/quickstart/?utm_medium=social&utm_source=youtube&utm_campaign=q4-2025_youtube-academy-links_aw
Observe, evaluate, and deploy agents with LangSmith: https://smith.langchain.com/?utm_medium=social&utm_source=youtube&utm_campaign=q4-2025_youtube-links_aw
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from LangChain · LangChain · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Chat With Your Documents Using LangChain + JavaScript
LangChain
LangChain SQL Webinar
LangChain
LangChain "OpenAI functions" Webinar
LangChain
LangSmith Launch
LangChain
LangChain x Pinecone: Supercharging Llama-2 with RAG
LangChain
LangChain Expression Language
LangChain
Building LLM applications with LangChain with Lance
LangChain
Benchmarking Question/Answering Over CSV Data
LangChain
LangChain "RAG Evaluation" Webinar
LangChain
Fine-tuning in Your Voice Webinar
LangChain
Tabular Data Retrieval
LangChain
Building an LLM Application with Audio by AssemblyAI
LangChain
Superagent Deepdive Webinar
LangChain
Lessons from Deploying LLMs with LangSmith
LangChain
Shortwave Assistant Deepdive Webinar
LangChain
Cognitive Architectures for Language Agents
LangChain
Effectively Building with LLMs in the Browser with Jacob
LangChain
Data Privacy for LLMs
LangChain
"Theory of Mind" Webinar with Plastic Labs
LangChain
LangChain Templates
LangChain
Using Natural Language to Query Postgres with Jacob
LangChain
Building a Research Assistant from Scratch
LangChain
Benchmarking RAG over LangChain Docs
LangChain
Skeleton-of-Thought: Building a New Template from Scratch
LangChain
Benchmarking Methods for Semi-Structured RAG
LangChain
LangSmith Highlights: Getting Started
LangChain
LangSmith Highlights: Debugging
LangChain
LangSmith Highlights: Datasets
LangChain
LangSmith Highlights: Evaluation
LangChain
LangSmith Highlights: Human Annotation
LangChain
LangSmith Highlights: Monitoring
LangChain
LangSmith Highlights: Hub
LangChain
SQL Research Assistant
LangChain
Getting Started with Multi-Modal LLMs
LangChain
Build a Full Stack RAG App With TypeScript
LangChain
Auto-Prompt Builder (with Hosted LangServe)
LangChain
LangChain v0.1.0 Launch: Introduction
LangChain
LangChain v0.1.0 Launch: Observability
LangChain
LangChain v0.1.0 Launch: Integrations
LangChain
LangChain v0.1.0 Launch: Composability
LangChain
LangChain v0.1.0 Launch: Streaming
LangChain
LangChain v0.1.0 Launch: Output Parsing
LangChain
LangChain v0.1.0 Launch: Retrieval
LangChain
LangChain v0.1.0 Launch: Agents
LangChain
Build and Deploy a RAG app with Pinecone Serverless
LangChain
Hosted LangServe + LangChain Templates
LangChain
LangGraph: Intro
LangChain
LangGraph: Agent Executor
LangChain
LangGraph: Chat Agent Executor
LangChain
LangGraph: Human-in-the-Loop
LangChain
LangGraph: Dynamically Returning a Tool Output Directly
LangChain
LangGraph: Respond in a Specific Format
LangChain
LangGraph: Managing Agent Steps
LangChain
LangGraph: Force-Calling a Tool
LangChain
LangGraph: Multi-Agent Workflows
LangChain
Streaming Events: Introducing a new `stream_events` method
LangChain
Building a web RAG chatbot: using LangChain, Exa (prev. Metaphor), LangSmith, and Hosted Langserve
LangChain
OpenGPTs
LangChain
Open Source RAG with Nomic's New Embedding Model (and ChromaDB and Ollama)
LangChain
LangGraph: Persistence
LangChain
More on: Tool Use & Function Calling
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
My agent kept reading data it wasn't allowed to. The prompt was never going to stop it.
Dev.to AI
8 Must-Know AI Chatbot Tools That Actually Help Small Businesses
Dev.to AI
Agent-Ready Commerce, Part 9: Evidence and Audit Are Part of the Product
Dev.to AI
Agent-Ready Commerce, Part 8: Generated Claims Need Review, Evidence, and Expiry
Dev.to AI
🎓
Tutor Explanation
DeepCamp AI