LangGraph Memory Management

Analytics Vidhya · Beginner ·🤖 AI Agents & Automation ·2mo ago

Key Takeaways

Teaches memory management in AI workflows using summarization and memory trimming techniques

Full Transcript

Hi there, it's good to see you again. Let's continue on this journey of understanding the fundamentals of Langra. In this session, we are going to talk about state and memory management. If you recolct in our previous sessions, we have already seen how you can design and develop a basic graphical workflow using Langraph. In the last session, we have designed an agent workflow wherein a node contained an LLM agent which was taking in the user input and responding back based upon the question. Here we are going to dive deeper into the concept of state and memory management. So let's get started. So why state matters, right? That's the obvious question that comes to everyone's mind. Now we know that LLM apps are nothing but multi-step workflows, not single calls. So when you are interacting with the agent, it will continue talking to you until you are satisfied with the responses. Right? So each step needs access to the previous inputs and the decisions that you have taken previously. So if you are designing or if you are trying to talk to an agent to create let's say u an itinary for you for Indonesia while interacting you might have certain preferences based upon your travel dates based upon what type of location you prefer based upon what type of transportation you like right so how many days you want to travel and every single interaction will keep on improving your relevant uh you know itinerary for your which is customized for you. So for for the agent to be able to understand what you like and you do not dislike it needs to access all the previous decisions that has been made in order to improve your overall itinary and state makes this data flow explicit and debug friendly. So using states we as we know states can store all of the previous information into a messages queue and then you can pass that to the agent when you are trying to respond to the most recent question. Right? So in this particular next question that comes to our mind is that okay why or what exactly is state? Now I have already explained this multiple times that state is nothing but a dictionary like object that travels through the entire graph. So we have discussed this previously that what exactly is a state? It's nothing but a dictionary like object which basically models the information that is supposed to travel through the entire graph. And typically a state will hold all the messages right what are the results what are the flags and everything internally which can be used by successive nodes in order to respond accordingly and each nodes is going to read from the state and writes backs whatever update it has made to that state object right and as you progress through your graph the state keeps on getting more and more enriched until you have received the final response from the agent. Now typically how do you design a good state? The first thing you always have to remember is always start simple, right? So just start with messages list and result field. Do not add anything complicated there and then you can group all the related fields together. So say for example, if you have make if you have retrieved several results, you can basically add the field to include all the results from there. Whatever the tool outputs you have received, you can group it together into the relevant field for tool outputs and such. But always make sure you start simple and keep adding fields as you progress down the line into your entire workflow. And always use the preferred type structure of either type dictionary, pentic, base models and such. You know, barebone dictionaries to start with. For larger projects, always prefer having these type structures as compared to the basic dictionaries in Python. Now the next question that is asked is what are the drawbacks of this particular thing. Now in in order to understand obviously you can think about keep on talking to agent as long as you have not received your response. But the major problem here is that how do you make the system remember the previous turn right? So the obvious way in which we can do it is basically keep a running list of all the messages in state right and that is what memory means. So memory is nothing but an a data structure that can be used to let system know what the previous conversation looked like. That's all memory is. And the simplest form of implementing memory is to use a running list of messages in your state. And at every single agent invocation, the agent will run the conversation and append itself when it is trying to call it. And then it is going to answer the query based upon the context it has in the entire memory of the agent invocation. Now memory in langraph typically flows like this. So as you keep on interacting with the agent, the messages field in the state keeps on growing as user and agent messages. So user messages is the input coming from the user and agent messages are basically the assistant responses that we add there. Now you can use whatever field you want to demarcate this but typically this is the con convention which is followed and you can have dedicated memory nodes that summarize or trim the history. Now what why do we need this? I'll explain this in the next slide but typically it's always a good idea to have a node which keeps a track of the length of your memory. Right? You cannot have a memory of infinite length. Otherwise, the model will basically run out of tokens to summarize whatever information you have passed it. Every single large language model has a limit on how many tokens it can read at one single time and you cannot pass more than that. Your model will start throwing an error. Hence, in order to avoid this, it's always a good idea to have a summarizer node in place which can always trim based upon how many messages are there in the current stack. And memory can be shared across multiple agent nodes. I think we have discussed this multiple times. Now why do we need to trim? Because large histories can bloat up the context window and cost. If you keep on adding more and more tokens in order to respond, the agent will read it will have to read more and more. Right? The input tokens will increase which will include increase the cost and also bloat up the context window as I spoke previously and the most common strategy in order to avoid this problem is to keep n last terms. So this n can be let's say 10 meaning when you are calling the agent right you will only use the last 10 messages and nothing more. So you will reduce or you will reserve your input to have last n messages only and nothing more. And this long-term memory can be stored in a vector database to keep the state lean. So in order to avoid having everything put in your state, you can integrate your system with an external vector database and store everything there rather than having it in your state. Because remember state is basically used or it's going to utilize your RAM. If your state grows your your memory consumption will increase which can basically lead to you know if if your if if it's not restricted then it can eventually lead to crash also right. So it's always a good idea to have a long-term memory stored in a vector database. Keep your state lean which will basically only contain last 10 turns and anything beyond that you keep appending your information into a vector database and keep your state as lean as possible. Now in in the hands-on part what we are going to cover. So in this hands-on we are going to see how can you extend a chat state with messages history. We'll also see how you can show the state evolve as the graph is running. Right? So we'll also see how you can actually visualize as you're as you're you know so here as we continue we'll also see how the state is evolving as you interact more and more with the agent. And finally we'll add a simple memory behavior wherein we are going to use the last n information not everything else. Right? Right. So we'll keep the memory lean as we discussed to ensure that it is storing only the most recent end messages and nothing more. So let's jump into the hands-on now. All right. So now we are ready to perform this hands-on notebook. And as previously mentioned here we are going to first load the environment variables which will allow us to connect with our OpenAI API key. So I'm loading this environment variable and then I'm setting up my client. It's it's exactly similar to the previous function with minimal changes. Let me explain those as well. So here you can use any model. So here we have used GPT4 mini but you can replace with GPT5 nano also. No doesn't matter. You can use any available model in the free tier of OpenAI. Here we are going to set up our client first which is going to read the environment variable of OpenAI API key. Then here we are going to get the messages which are all the information that we have conversated with our agent till now. We'll design a transcript. Transcript is nothing but a single string which compresses all the messages together. And then we are going to use the completions API from OpenAI to complete the information. Right? So again we are setting up a system role asking the agent to behave in a certain way and the user input will be the conversation history which is so far created which will contain your transcript above and the most recent question that the user has asked in the transcript itself. The transcript will contain the most recent information also. And then we are asking the agent to respond back in up to 100 words with the answer which a user has requested and we'll return back just that contract. Next up we are initializing our state. So here you can see the state is very pretty much simple. We have messages, input and the result. Right? So we are not you know complicating it too much. Simple state which we have initialized up until now. And this time we are going to design three nodes. The first node will be the simple user node which will initiate our state with the user input. So in this case you can see we are designing the user input appending the user message that we have received and we are basically adding the messages also that we have created here. Then comes the agent node which designs the messages here. So here we are going to take the messages will then call the LLM with the input messages that we have received. And then whatever reply we have received from the agent, we'll add it in our messages queue as the assistant response and we'll return back the updated state with the updated messages and the newly found result. And last but not the least, this is the trim memory node which basically trims the state with the last n messages. So here n is equal to six, right? Right? So we are only keeping the previous six messages in state and then we are going to get the number of messages. We are going to check the length of messages. If those messages have increased then we are only keeping the last n messages here. Right? So that's what pretty simple thing. Continuing that now it's time to build up our graph. So let me first create this and let's design our graph. Again the first part is to add all of our nodes. User node agent node and trim node. Then connect everything by setting up the entry point. Adding the edge between user and agent. Adding the edge between agent and trim. And finally adding an edge between trim and end to end the conversation. We'll compile our graph and see what the final app structure looks like. And now you see we have three nodes user, agent and trim which will be called sequentially. And then we are going to ask a few questions to our agent in a loop. So we start off by a simple question what exactly is langraph? Then the follow-up question is how does it handle state? Then the next question is what about memory in long conversations and last is how can I integrate lang graph into an existing project. So we have these series of four questions that we want to ask one by one. So we write a simple loop just to simulate that the agent the state the user input is the most recent question that we have asked and here we are going to then invoke our app with the most recent question. That's the user input right? We'll basically then log it to see what the question is and then we'll take the message in the state and we'll also print each and every message that we have after the previous message has been responded back and then we are going to finally print how many messages have been stored in the state at current point of time. Let's run this and see this in action. And remember as always this might take a few seconds to run because we are calling the agent API every single time from OpenAI. Right? So this is the first response that we have. What is lang graph? This is basically the output that we have received after the first question. The loop is still running. Right? So it's still in the process. Then here is the follow-up question that we had asked. So this was the first question. What is langraph? The response received from that. And then the next question is how does it handle state? The agent basically responded back as state as both data and process. So basically whatever the information it wants. Then we have the third question. What is lang graph? After that the third question is what about memory in long conversation. So lang graph keeps long conversation as a graph state which we are currently seeing and we can then trim it up as well. And how does it handle state? Right? That was the last question and you can see this is the entire conversation around it. And you will notice the total messages which are stored in state are only six. So any given point of time it will not store more than six messages. that includes user input as well as the assistant responses. And that's a quick demo on how can you handle memory and how can you trim your memory to limit the token count and the prevent the issue of bloating of the conversation history using a simple agent node or a trim node in your workflow. So that's all for this session. I'll see you in the next video. Thank you.

Original Description

Description: Discover how to manage "Memory" in AI workflows. This video teaches you how to keep your agents context-aware while preventing context window bloat by using summarization and memory trimming techniques. Chapters: 0:00 Why is State Management Important? 1:30 Designing a Lean and Scalable State 2:45 Introduction to Memory in LangGraph 3:30 Handling Long Conversations & Token Costs 4:20 Memory Trimming Strategies 5:45 Hands-on: Implementing a Trim Memory Node 7:30 Comparing Memory in State vs. Vector DBs 9:00 Loop Execution: Multi-turn Conversations 11:00 Final Trace: Inspecting State History #ArtificialIntelligence #MachineLearning #MemoryManagement #Python #AITips
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Lumo Is a Privacy-Focused AI Chatbot, With Clear Limits
Learn about Lumo, a privacy-focused AI chatbot with no chat logs, and understand its implications on user data protection
Dev.to · Simon Paxton
I Let 5 AI Agents Shop For Me in 2026. It Went About as Well as You’d Expect.
Learn from an experiment where 5 AI agents were used to shop for everyday items, highlighting what works and what doesn't in AI-powered shopping
Medium · AI
The Governance Gap Nobody's Measuring
Learn how to identify and address the governance gap in AI systems, where configuration changes can lead to unintended consequences, and why it matters for ensuring accountability and transparency
Medium · AI
My agent kept reading data it wasn't allowed to. The prompt was never going to stop it.
Learn how to secure autonomous agents with proper credential management to prevent unauthorized data access
Dev.to AI

Chapters (9)

Why is State Management Important?
1:30 Designing a Lean and Scalable State
2:45 Introduction to Memory in LangGraph
3:30 Handling Long Conversations & Token Costs
4:20 Memory Trimming Strategies
5:45 Hands-on: Implementing a Trim Memory Node
7:30 Comparing Memory in State vs. Vector DBs
9:00 Loop Execution: Multi-turn Conversations
11:00 Final Trace: Inspecting State History
Up next
Building Great Agent Skills: The Missing Manual
AI Engineer
Watch →