Learn Python: Binning Data Using The Cut Function #python
Key Takeaways
The video demonstrates how to use the pandas Cut function to bin data in Python, specifically for data analytics tasks such as creating price bins and analyzing the relationship between price and carrot weight.
Full Transcript
Bending data is an important task we often have to perform in analytics. Let's take a look at how to do this using the cut function from the pandas library in Python. So, we're going to read in our diamond data. And we have a data set where each row is a diamond. We have the price of the diamond here, $326. We have information like how good is the cut, the color, the clarity, and then we have carrot weight. And so if I'm performing this analysis for myself, I might look at a scatter plot of carrot weight versus price to understand what the relationship is. But a lot of the times you might want to take a variable like price in bin price into something like 10 price bins. So we can produce a bar plot or perhaps look at our analysis from different angles. So let's go ahead and create a price bin column. We're going to use the cut function from pandas. We're going to pass in our price column and then we need to list out the borders of our bins. So, our bins are going to be from 0 to 500, 500 to 1,000, 1,00 to 2,000, 2 to 5, 5 to 10, 10 to 20. And if we go ahead and take a look at our data now, we have a price bin column here that explains what the range is. So, we have a soft boundary on our lower bound and a hard boundary on our upper bound, meaning that zero technically isn't included in this bin here. But, we were able to successfully produce these bends. And this type of variable uh isn't necessarily the easiest to work with. So we can also create labels for these. So this is going to create a string column here where we have 0 to 500, 500 to 1k, etc., etc., representing our bends. And so if we wanted to create a plot, for example, of our price range by average carrot weight, we could add a nice headline breaking big diamonds more expensive. But we can explore the relationship between price and average carrot weight reducing some of the noise and complexity we might see in something like a traditional scatter plot. And so additionally we could also just specify a number of bins which are going to be equally spaced. So as opposed to specifying our bins we could also just ask for 10 price bins and then we could go ahead and plot like this. We'll need to clean up our axes. Again, not adding labels makes things a little bit messy. But if we don't have hard boundaries that we want to set as analysts, we can just specify the number of bins we want and create those as well. And finally, there's an additional function called Q cut, which will take a look at the quantiles. So, as opposed to just having four equally spaced bins, passing in four to the Q cut function will produce quartiles. If I pass in 10 here, this will produce deciles. And this will produce a little bit different results, sometimes drastically different results than just asking for 10 bends with to the traditional cut function. So that's how we can bend data with pandas. Hopefully you found this helpful. We'll see you in the next video.
Original Description
Binning data is an important task you need to learn if you're working in Analytics.
In this example, we'll walk through how you can bin your data using Python.
We'll use the Pandas Cut function, and you'll see how quickly we can produce bins and bin labels.
This can be really useful in aggregating your data to reduce the noise you would see if you were looking at individual data points.
Happy Learning!
Maven Analytics is an award-winning learning platform where individuals and teams build new skills, showcase work, and connect with experts around the world.
We've helped 2M+ people build job-ready data literacy & AI skills, master tools like Excel, SQL, Power BI, Tableau and Python, and build the foundation for successful careers.
Start building life-changing skills for FREE at mavenanalytics.io.
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: Python for Data
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Surviving the Data Science Behavioral Interview
Towards Data Science
Before I needed it, no one told me that "legacy tape management" was an entire industry.
Reddit r/artificial
Top 5 DBMS Concepts (2026) | Perfectnotes
Medium · Data Science
The Nervous System of the Telco: Unlocking the Real-Time Power of the Network Element Interfaces…
Medium · Data Science
🎓
Tutor Explanation
DeepCamp AI