Garbage Collection in Python: Speed Up Your Code

NeuralNine · Beginner ·💻 AI-Assisted Coding ·2y ago

Key Takeaways

The video discusses garbage collection in Python, explaining how it works and providing methods to manually speed up code, including using the gc module and setting thresholds.

Full Transcript

what is going on guys welcome back in this video today we're going to learn about garbage collection and python how it works and what we can do to potentially speed up our coat so let us get right into [Music] it all right so we're going to discuss garbage collection in this video today what the basic idea behind it is how it works how it's implemented in Python what the basic mechanism is behind it and we're going to take a look at what we can do manually when it comes to garbage collection to maybe speed up our code and our programs but before we get into any of the coding I would like to briefly explain to you on a theoretical level what is happening behind the scenes when it comes to garbage collection what reference counting is in Python and I don't want to go too deep into the technical details I want to keep it uh concise and simple here but before we get into any of the concrete coding and concrete experimentation with python I want to just explain briefly what's happening uh in the background and for this I have my paint opened up and the idea of reference refence counting is quite simple now python collects the garbage based on reference counting the idea of reference counting is I have some object here for example object a and this object a can be referenced by other objects a very simple example is let's say I have an object B and B is a list and a list contains many elements a could be part of B so actually this would mean b is referencing a and what python does is for each object it has something called a reference count so how many references are there to this object a and you could say now that this adds one referenced and maybe I have some object C this could now be a class for example and one attribute of the class for example c. name could be pointing to a as well and that would also be+ one when it comes to reference counting the idea now is that when an object is completely unreachable we can completely destroy it and we don't need it any anymore so the idea is if this connection does no longer exist this connection does no longer exist and there's no way to reach a in any way and we delete a then basically the garbage collector can destroy a without any problems that's now the basic explanation without too much technical detail here now the problem is that sometimes we can have uh or not necessarily a problem but the thing is sometimes we can have cyclical references so we can have something like a link one pointing to a link two pointing to a link three and then link three points maybe to L1 again and then I have some external object a pointing to L1 now I can delete L3 I can delete L2 and I can delete L1 however these object will still remain in uh the memory they will still remain available why because there is a path since L1 will not be deleted because the L1 reference count is at least one a is pointing to a L1 uh and because L1 is not going to be deleted from a I will be able to reach L3 because from a I can reach L1 from L1 I can reach L2 and from L2 I can reach L3 as long as this path still exists I cannot really destroy these objects I can delete in Python the variable L3 L2 L1 but I cannot really destroy the objects because they are necessary I can still reach them whereas if I have something else here and this is all from the documentation from the developer guide if you have um L4 link 4 pointing to itself and then you delete L4 you can completely destroy it because there's nothing pointing to L4 the basic idea is now garbage collection takes care of stuff like this it analyzes which objects are unreachable and which objects are still reachable and all the unreachable objects are going to be destroyed so that we don't waste any memory that's the basic idea of garbage collection I hope this was not too complicated the important thing to understand now is that we work with three so-called Generations so when it comes to garbage collection in Python we have three generations generation zero Generation 1 generation 2 now every time you allocate space for an object and every time an object is created it's basically belonging to g0 to generation zero so I can have an object a an object B an object C and so on and after a certain threshold of allocations uh the garbage collector is going to come in and analyze which of these objects are still reachable and which of them are not so maybe it realizes a is still there b is still reachable but C is completely useless it's unreachable we can destroy it so this is one iteration or one cycle of garbage collection so C would be destroyed A and B are not destroyed they persist and because of that they're moved to generation one so now we have a and b in generation one these are objects um and the idea is why do we move them to generation one out of generation zero the idea is that there is an assumption that most objects have a very short lifespan but those that persist long or already for quite a long time are more likely to persist even longer which means that since A and B are still around after this cycle uh we expect them to be around for even longer so we don't have to check them all the time we're going to check them less often than all the objects in g0 so then maybe we some new objects or we allocate some space for new objects DF and so on after a certain threshold is um again uh reached what we do again is we uh check which are reachable maybe these two are no longer reachable D is moved to generation one now after we do this 10 times this is those are Now the default values uh after you do this 10 times a certain threshold is reached for generation one and then we go into generation one and the garbage collector analyzes which of these objects are still around so a b d are all of these still reachable maybe we see okay no B is no longer reachable but A and D are still around and after these 10 Loops of generation zero garbage collection we can then move A and D to generation 2 which is going to be checked even less often it's going to be checked after 10 times garbage collection in generation one um I hope this is not too confusing the basic idea is that for each of these Generations you have thresholds for the first generation you have a threshold saying after 700 allocations this is Now the default value in Python after 700 allocations perform garbage collection analyze which of these objects are reachable which of these objects are no longer reachable and destroy all the unreachable objects when you persist this garbage collection you're moved to generation one when you persist this 10 times so after 10 times generation one is garbage garbage collected this is now the threshold here so after 10 times garbage collection in generation zero a garbage collection is performed for generation one if you persist here as well you're mve to generation two and um that is this uh threshold here so when you when you basically do 10 times garbage collection in generation one you do it once in generation 2 that's the basic idea all right now let us go ahead and play around with these Concepts in Python directly for this one going to open up an interactive python Shell by either typing python or Python 3 into the terminal into the command line and then we're going to import the two packages CIS and GC which stands for garbage collection both are core python packages no need to install anything and one function provided by the CIS package is the function get ref count and this basically tells us how many references are there for a specific object or to a specific object so for example if I say a equal hello world and then I run CIS get ref count on a you can see we have two references here and if I now say my list is an empty list and I say my list. aent a now my list is referencing a and when I run this I can see I have one more reference I can also see where the reference is coming from by running GC get referers and I can type a here and you can see my list is uh part of it so that is also something that you can explore here now the interesting thing here is that you can call the function GC get thresholds to see what the current uh thresholds are 700 1010 what I just explained in my pain and you can also change these so you can cause garbage collection to happen more often or less often so of course garbage collection takes some time and if you run it less often you can speed up your code in certain circumstances not always but we can set the threshold to something else by just saying uh for example 1,000 2030 or something like this and then when I run this you can see those are now the new values I can also see what the current state is so GC get count will tell me what the current state is we have 415 allocations we have 10 times uh generation one garbage collection and one time generation 2 garbage collection already have having or already have happened uh in the past and if you want to see when garbage collection is happening you can enable the so-called debug mode so you can say GC set debug true and this will basically uh show you a message every time a garbage collection is performed and it will tell you exactly when it's happening and how it's happening now for this we're going to move away from the interactive shell into an actual script here we're going to start again by importing GC and by importing CIS and and then we're going to say GC set debug to true so that we can see what's happening and I'm going to use now the example that I showed you in my paint which is from the def guide of python and this is a class link and this class link has a Constructor which takes a parameter next link and it also takes a value in my case now this is a little bit modified this is not the exact example from the uh def guide but the idea is that you have self next link being equal to next link and self value being equal to value and then we're going to have a representation Dunder method which just Returns the value so that we just get a string instead of this python main object at certain address um so that is our class and what we're going to do now is we're going to um what we're going to do is we're going to cause garbage collection to have happen quite often first of all so that we can see the output and second of all so that we can see that this can actually speed up the process if you do garbage collection less often so this is a very artificially crafted example but there are also examples that are actual use cases not just artificially crafted examples where reducing the amount of garbage collection checks that you do is going to speed up your code so what we're going to do here is we're going to create one link which is going to be our uh main link so we're going to say link link it will have no link that it's pointing to and it's going to have the value main link and then we're going to say now that we want to have my list empty list and we want to say for I in range and let's pick a large number which one did I pick here 5 million what we're going to do is we're just going to create a new link l Temp and this l temp is going to be a link pointing to the main link and it's going to have a value L and it's going to be added to my list so we create quite a lot of references here my list append L Temp and what we're going to do here as well is we're going to import the python package time and we're going to measure how long this takes so time or actually start is equal to time perf counter and is equal to time perve counter and then print end minus start to see how long it takes with the default settings uh so we have this now I can run this and first of all what you can see here maybe we can stop this I can then disable it to show you uh but you can see now how the garbage collection is being done you can see here garbage collection is happening all the time um and it's also being locked now you can also see that it happens in different Generations so it happens in 0 0 0 1 0 0 0 1 so after 10 generations of zero it happens uh in generation one after 10 times Generation 1 it happens in generation 2 so you can analyze this whole process here if you set debug to true but of course this is going to uh show a very verbose output if we disable that we're just going to get our result which is going to be 4.31 seconds I can run this again to see that this is roughly how long it takes in this case uh with all the garbage collection being done now we can go ahead and we can change the thresholds I can go and say here GC set threshold and I can say instead of doing it for 700 allocations do it for 20,000 allocations every 20,000 allocations check uh for or do garbage collection um and for generation one do it only for uh you know every 50 times or for Generation two do it every 100 times so if I run this now you will see that this takes much less time and I can also take this to an extreme by saying GC disable this disables garbage collection completely basically says don't do any garbage collection at all and then you're going to see that this runs even faster now what you can also do make for this we're going to use the interactive shell again uh what you can also do is you can uh collect the garbage manually to see what happens so you can say import GC we can say GC set debug true and now I can say GC get count and what I can do is I can say GC collect and I can pass generation collect generation zero and you can see I get this message now collecting generation zero and you can see that the count for generation one increased as well uh now when I collect um generation 2 you can see it collects everything so it resets everything um so yeah collecting generation 2 also collects everything else or basically uh resets everything to the beginning so that is also something that you can do you can disable you can enable of course so you can also disable for a certain section so one thing is for example if you have some database axis which uh causes a lot of uh references to be created but you don't really have much or you don't really have many unreachable um objects so you don't want to really do garbage collection too much you can say GC disable then some code and then GC enable afterwards again and then maybe you can do some GC collect manually to catch up or something uh that is something that can in certain circumstances and certain situations speed up your code massively so that's it for today's video I hope you enjoyed it and hope you learned something if so let me know by hitting a like button and leaving a comment in a comment section down below and of course don't forget to subscribe to this Channel and hit the notification Bell to not miss a single future video for free other than that thank you much for watching see you in the next video and bye that

Original Description

Today we will learn about garbage collection in Python. How it is done and what we can do manually, in order to speed up our code. ◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾◾ 📚 Programming Books & Merch 📚 🐍 The Python Bible Book: https://www.neuralnine.com/books/ 💻 The Algorithm Bible Book: https://www.neuralnine.com/books/ 👕 Programming Merch: https://www.neuralnine.com/shop 💼 Services 💼 💻 Freelancing & Tutoring: https://www.neuralnine.com/services 🌐 Social Media & Contact 🌐 📱 Website: https://www.neuralnine.com/ 📷 Instagram: https://www.instagram.com/neuralnine 🐦 Twitter: https://twitter.com/neuralnine 🤵 LinkedIn: https://www.linkedin.com/company/neuralnine/ 📁 GitHub: https://github.com/NeuralNine 🎙 Discord: https://discord.gg/JU4xr8U3dm
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from NeuralNine · NeuralNine · 0 of 60

← Previous Next →
1 Visualizing Stock Data With Candlestick Charts in Python
Visualizing Stock Data With Candlestick Charts in Python
NeuralNine
2 Python Beginner Tutorial #1 - Installation and First Program
Python Beginner Tutorial #1 - Installation and First Program
NeuralNine
3 Python Beginner Tutorial #2 - Variables and Data Types
Python Beginner Tutorial #2 - Variables and Data Types
NeuralNine
4 Python Beginner Tutorial #3 - Operators and User Input
Python Beginner Tutorial #3 - Operators and User Input
NeuralNine
5 Python Beginner Tutorial #4 - If Statements and Conditions
Python Beginner Tutorial #4 - If Statements and Conditions
NeuralNine
6 Python Beginner Tutorial #5 - Loops
Python Beginner Tutorial #5 - Loops
NeuralNine
7 Python Beginner Tutorial #6 - Sequences and Collections
Python Beginner Tutorial #6 - Sequences and Collections
NeuralNine
8 Python Beginner Tutorial #7 - Functions
Python Beginner Tutorial #7 - Functions
NeuralNine
9 Python Beginner Tutorial #8 - Exception Handling
Python Beginner Tutorial #8 - Exception Handling
NeuralNine
10 Python Beginner Tutorial #9 - File Operations
Python Beginner Tutorial #9 - File Operations
NeuralNine
11 Python Beginner Tutorial #10 - String Functions
Python Beginner Tutorial #10 - String Functions
NeuralNine
12 Python Intermediate Tutorial #1 - Classes and Objects
Python Intermediate Tutorial #1 - Classes and Objects
NeuralNine
13 Python Intermediate Tutorial #2 - Inheritance
Python Intermediate Tutorial #2 - Inheritance
NeuralNine
14 Python Intermediate Tutorial #3 - Multithreading
Python Intermediate Tutorial #3 - Multithreading
NeuralNine
15 Python Intermediate Tutorial #4 - Synchronizing Threads
Python Intermediate Tutorial #4 - Synchronizing Threads
NeuralNine
16 Python Intermediate Tutorial #5 - Events and Daemon Threads
Python Intermediate Tutorial #5 - Events and Daemon Threads
NeuralNine
17 Python Intermediate Tutorial #6 - Queues
Python Intermediate Tutorial #6 - Queues
NeuralNine
18 Python Intermediate Tutorial #7 - Sockets and Network Programming
Python Intermediate Tutorial #7 - Sockets and Network Programming
NeuralNine
19 Python Intermediate Tutorial #8 - Database Programming
Python Intermediate Tutorial #8 - Database Programming
NeuralNine
20 Python Intermediate Tutorial #9 - Recursion
Python Intermediate Tutorial #9 - Recursion
NeuralNine
21 Python Intermediate Tutorial #10 - XML Processing
Python Intermediate Tutorial #10 - XML Processing
NeuralNine
22 Python Intermediate Tutorial #11 - Logging
Python Intermediate Tutorial #11 - Logging
NeuralNine
23 Python Data Science Tutorial #1 - Anaconda and PyCharm Setup
Python Data Science Tutorial #1 - Anaconda and PyCharm Setup
NeuralNine
24 Python Data Science Tutorial #2 - NumPy Arrays
Python Data Science Tutorial #2 - NumPy Arrays
NeuralNine
25 Python Data Science Tutorial #3 - Numpy Functions
Python Data Science Tutorial #3 - Numpy Functions
NeuralNine
26 Python Data Science Tutorial #4 - Plotting Functions With Matplotlib
Python Data Science Tutorial #4 - Plotting Functions With Matplotlib
NeuralNine
27 Python Data Science Tutorial #5 - Subplots and Multiple Windows
Python Data Science Tutorial #5 - Subplots and Multiple Windows
NeuralNine
28 Python Data Science Tutorial #6 - Matplotlib Styling
Python Data Science Tutorial #6 - Matplotlib Styling
NeuralNine
29 Python Data Science Tutorial #7 - Bar Charts with Matplotlib
Python Data Science Tutorial #7 - Bar Charts with Matplotlib
NeuralNine
30 Python Data Science Tutorial #8 - Pie Charts with Matplotlib
Python Data Science Tutorial #8 - Pie Charts with Matplotlib
NeuralNine
31 Python Data Science Tutorial #9 - Plotting Histograms with Matplotlib
Python Data Science Tutorial #9 - Plotting Histograms with Matplotlib
NeuralNine
32 Python Data Science Tutorial #10 - Scatter Plots with Matplotlib
Python Data Science Tutorial #10 - Scatter Plots with Matplotlib
NeuralNine
33 Python Data Science Tutorial #11 - 3D Plotting with Matplotlib
Python Data Science Tutorial #11 - 3D Plotting with Matplotlib
NeuralNine
34 Python Data Science Tutorial #12 - Pandas Series
Python Data Science Tutorial #12 - Pandas Series
NeuralNine
35 Python Data Science Tutorial #13 - Pandas Data Frames
Python Data Science Tutorial #13 - Pandas Data Frames
NeuralNine
36 Python Data Science Tutorial #14 - Pandas Statistics
Python Data Science Tutorial #14 - Pandas Statistics
NeuralNine
37 Python Data Science Tutorial #15 - Pandas Sorting and Functions
Python Data Science Tutorial #15 - Pandas Sorting and Functions
NeuralNine
38 Python Data Science Tutorial #16 - Pandas Merging Data Frames
Python Data Science Tutorial #16 - Pandas Merging Data Frames
NeuralNine
39 Python Data Science Tutorial #17 - Pandas Queries
Python Data Science Tutorial #17 - Pandas Queries
NeuralNine
40 Python Machine Learning Tutorial #1 - What is Machine Learning?
Python Machine Learning Tutorial #1 - What is Machine Learning?
NeuralNine
41 Python Machine Learning Tutorial #2 - Linear Regression
Python Machine Learning Tutorial #2 - Linear Regression
NeuralNine
42 Python Machine Learning Tutorial #3 - K-Nearest Neighbors Classification
Python Machine Learning Tutorial #3 - K-Nearest Neighbors Classification
NeuralNine
43 Python Machine Learning #4 - Support Vector Machines
Python Machine Learning #4 - Support Vector Machines
NeuralNine
44 Python Machine Learning Tutorial #5 - Decision Trees and Random Forest Classification
Python Machine Learning Tutorial #5 - Decision Trees and Random Forest Classification
NeuralNine
45 Python Machine Learning Tutorial #6 - K-Means Clustering
Python Machine Learning Tutorial #6 - K-Means Clustering
NeuralNine
46 Python Machine Learning Tutorial #7 - Neural Networks
Python Machine Learning Tutorial #7 - Neural Networks
NeuralNine
47 Python Machine Learning Tutorial #8 - Handwritten Digit Recognition with Tensorflow
Python Machine Learning Tutorial #8 - Handwritten Digit Recognition with Tensorflow
NeuralNine
48 Generating Poetic Texts with Recurrent Neural Networks in Python
Generating Poetic Texts with Recurrent Neural Networks in Python
NeuralNine
49 Stock Portfolio Visualization with Matplotlib in Python
Stock Portfolio Visualization with Matplotlib in Python
NeuralNine
50 Analyzing Coronavirus with Python (COVID-19)
Analyzing Coronavirus with Python (COVID-19)
NeuralNine
51 Making Text Images Readable Again with Python and OpenCV
Making Text Images Readable Again with Python and OpenCV
NeuralNine
52 Neural Networks Simply Explained (Theory)
Neural Networks Simply Explained (Theory)
NeuralNine
53 Motion Filtering with OpenCV in Python
Motion Filtering with OpenCV in Python
NeuralNine
54 Top 5 Programming Languages To Learn in 2020
Top 5 Programming Languages To Learn in 2020
NeuralNine
55 Simple TCP Chat Room in Python
Simple TCP Chat Room in Python
NeuralNine
56 Image Classification with Neural Networks in Python
Image Classification with Neural Networks in Python
NeuralNine
57 Edge Detection with OpenCV in Python
Edge Detection with OpenCV in Python
NeuralNine
58 S&P 500 Web Scraping with Python
S&P 500 Web Scraping with Python
NeuralNine
59 Simple Sentiment Text Analysis in Python
Simple Sentiment Text Analysis in Python
NeuralNine
60 Introduction - Algorithms & Data Structures #1
Introduction - Algorithms & Data Structures #1
NeuralNine

This video teaches how to use garbage collection in Python to speed up code, including understanding generations, object lifetime, and manual threshold setting. It provides practical steps to optimize Python code and improve performance.

Key Takeaways
  1. Import the gc module
  2. Use gc.get_ref_count to get the number of references to an object
  3. Use gc.get_referrers to get a list of objects that reference the given object
  4. Use gc.get_thresholds to get the current garbage collection thresholds
  5. Use gc.set_debug to enable debug mode for garbage collection
  6. Set debug to true
  7. Change the threshold using GC set threshold
  8. Disable garbage collection using GC disable
  9. Manually collect garbage using GC collect
💡 Garbage collection in Python can be manually controlled and optimized using the gc module, allowing for improved code performance and memory management.

Related Reads

📰
Token Efficiency for AI Coding Agents: A Practical Guide
Optimize AI coding agent token spend with smart routing, planning, and culture shifts to improve efficiency
Dev.to · Aleksandr Kamenev
📰
AI Can Write Code. So What Makes a Developer Valuable? Why PyNyx Thinks the Answer Has Changed
AI can now write code, so what makes a developer valuable? Learn how the role of developers is changing with AI advancements
Dev.to · PyNyx
📰
Claude Code From My Phone Is an Unfair Advantage.
Use AI agents to accelerate coding tasks and gain an unfair advantage in development
Medium · AI
📰
Your Codebase Just Wrote Itself. Terrified Yet?
Explore the implications of AI-generated code on the software development industry and its potential impact on developer roles
Dev.to · Chathura Rathnayaka
Up next
Copilot Cowork: Setup, Skills, Plugins & Pricing
Matt Tutorials
Watch →