Rolling Apply and Mapping Functions - p.15 Data Analysis with Python and Pandas Tutorial
Key Takeaways
The video demonstrates the use of rolling apply and mapping functions in pandas to create custom data transformations, including loading data from a pickle file, generating labels for machine learning classification, and calculating moving averages. Tools such as pandas, scikit-learn, and numpy are utilized to handle data cleaning, preprocessing, and manipulation.
Full Transcript
What is going on dudes and dudetss? Welcome to part 15 of our data analysis with Python and Pandas tutorial series. In this part, we're going to be talking about how we can start structuring our data set uh so we can feed through a scikitlearn uh machine learning classifier. First of all, at the end of the previous tutorial, we basically took all this data and we stuffed it into this hpi.pickle. That means we are all set here. Now, if you want to, I would recommend saving the code from part 14. Uh, that way, if you want to add more stuff to your pickle, you can do that. Uh, but of course, you can always load in the pickle, add some new functions to that pickle, and then make a new pickle. Okay? You can do whatever you want. Uh, but for now, I'm going to just delete basically all this code all the way up to this point. Okay? So um in order to feed it through a machine learning classifier we got to do uh one major thing and with machine learning generally with supervised machine learning anyway what we have to do is I mean obviously we could also try filtering through unsupervised but anyway supervised you have features and you got your labels okay so you've got feature sets usually so like features would be um you know the state HPI state unemployment US GDP SP500 these are features at the moment And then the label or the classification is you know that current HPI value. And our goal is to at least in this scenario we want to predict changes in the HPI. So what we end up doing is we're going to take current HPI values compare current to next month's HPI value or maybe the next two months HPI value and either HPI went up or HPI went down. If HPI went up, we call that a one. If HPI went down, we call that a zero. And then so that's the classification. And then the feature sets that made up that classification is that month's, you know, features. Okay? So we feed through all those features through a machine learning classifier. So we say, okay, machine learning classifier today we have such and such GDP. We've got the current HPI is X, we've got uh unemployment rate at, you know, Y, and so on. We go down the list. what's going to happen next month? Are we going to go up or down? And the goal is to have relatively high accuracy. Ideally, something over 70%, but obviously we'd like 80s or more. Obviously, we'd like 100%, but we're not going to get it. Uh we want something that's relatively accurate at predicting uh the housing price index. So, uh the first thing we have to do is we have to generate that label, right? because right now we have the HPI uh and we can find out what the next HPI value is, but we don't currently have those numbers. So, we need to we need to generate that data. If you're still kind of fuzzy on that, don't worry. We'll I think it'll clear up for you as we start writing and uh coding all this. So, in order to do that, we're going to cover mapping functions and then when we since we're going to map these custom functions, I'm also going to talk about the rolling apply because uh those both are very similar in my book. it basically if pandas doesn't do for you what you want pandas to do you can write a quick function and make that happen or you can write a quick like apply and do the rolling apply and that kind of stuff. So this is Panda's way of letting you you know have such an an awesome module but if they don't have the things that you need specifically it's fully customizable. So first of all we want to load in our data. Uh loading in our data is pretty simple. It's going to be uh housing data equals PD read_pickle. I do like that one line to read undo a pickle. That is nice. HPI.pickle. Boom. We read the pickle in. Goes to housing data. Good to go. Okay. Next. Um, side note, I wonder if pandas can read every pickle or if that pickle must be a data frame because otherwise you would never even import pickle, right? We can even get rid I can't believe that's still around. I don't know how that's still even there. We haven't used that for a long time. Anyway, um I wonder if you could always use pandas for like all pickles or something cuz using only one line to open up a pickle and read it and stuff. That's kind of nice. Anyway, what we're going to do now, uh we've read the pickle in. So now we have the housing data. Now, if you recall before when we did our little percent change thing, uh, we wanted the percent change over time. But when we're investing, we don't really care about percent change from the first value. We care about what's the value now and what's the percent change to the next value. So, what we're going to go ahead and do is actually apply a percent change to the entire data frame. So, we do that by going like this. housing data equals housing data.pct_change. Done. Let's print out housing data. Just so we can kind of see where we are. Uhoh. It's saying no. Liar. You are lying to me. I know I have it there. What is this sorcery? Let's see. Did we save it as HPI? Maybe I didn't save it as HPI. Pickle. I'm pretty sure I did. Let me pull it up. Here we go. Uh, whoops. Went the wrong way. [Music] Pandas tutorial. Apparently, I don't have it. This is the code that we wrote. HPI to pickle. Oh, I must have gotten an error. That's really strange. I don't recall getting an error in that last video, but maybe I cut it off before I was complete making the pickle or something. Odd. Okay. Or maybe the air maybe it was just like off the screen. I don't know. Anyway. Okay. There I have the pickle now. That was odd. I'm so surprised I didn't catch that in the last video. Anyway, carrying on loading the pickle. There we go. So, the first value is not a number because obviously we can't calculate a percent change off nothing. So, okay. But otherwise, we've got all the state HPIs uh or housing price indexes or indices. And then we have although I think we're missing the benchmark unless it was truncated, but I don't think it was. That's okay. We'll leave that out for now anyway. Um and yeah, so we just now at this point we want to drop the the not of numbers basically. Um, and so we can either do uh actually let's go back. Let's see what that was. Yeah, sometimes for some reason I've gotten a number before being infinity, but at least uh that's weird. In the past, I've seen when you do percent change, it'll be okay. At least we have some infinity values. So, we've got not a number and negative infinity. So, we got a handle for both of those. Uh so first of all we can do housing data.dropna in place equals true. Second what we're going to do is uh we don't have numpy imported. So let's bring in numpy import numpy as np. Coming back down uh what we're going to go ahead and do here is we're going to say housing data.replace and we're going to actually we're going to let's do this above not a number. We're going to replace it with not a number replace. And we're going to replace numpy.infinity as well as uh negative numpy infinity. We're going to replace that with np.n. And then in place equals true. You could also uh that's really the only way we're going to get away with that as far as I know. Uh okay. Then come take this paste that down there. Let's make sure we Oops, forgot our comma. Let's make sure that worked up till now. Here it is. Looks like it did. So, that's good. Uh, the next thing that we're going to do now is we want to um we want to take the Let's see. Are we do we not have that USHPI? Very angry. Now, let's go back to uh 14. And then let's bring that old code back up. And let's see. H there's HPI benchmark here. So I'm going to copy that. And apparently we never had that as a part of the let's see HPI data equals read pickle. HPI bench equals HPI benchmark. But we never added that. Let's go ahead and copy this. And I'm going to add the benchmark cuz I guess we just didn't have it there. So go ahead and run this one more time. Save that into the pickle because we do want to compare it to the benchmark. I I mean the benchmark is what we want. We could have done we could do any state though, right? You could have you could run this information on just the uh the state of Texas or any other state. But anyway, uh so let's close this out now. Hopefully let's run this uh where Yeah. Okay. So, United States. Probably should call that United States HPI, but uh we'll leave that for now. Anyway, um so that's called United States. Anyway, so now what we're going to say is uh so housing data.head. What we'll do, so now we have all the data that we want. I'm going to get rid of that. Now we're going to say housing data. And then we're going to say the column here is going to be users oops us_hpi future. So this is the future US API equals uh housing data dot or actually we have to do this United States United States housing data United States uh and then we do this we do shift minus one now print housing data head and what it should do is shift that column down for one point. So, uh, so it's the future value. So, yeah. So here right the current value is 4019 and we're seeing and in fact maybe what we should do is just so it's so much more clear probably uh would be this and then take us future this the current value is United States paste uh so the future will be on the left the current will be on the right so here's the current value and it's saying the future is 4957 we're sure enough if we can look in the future to see that So what it did is it copies that column but shifted it down minus one. So that's our way of you know accessing the the future number. Well of course um every time we'll just drop in a this way cuz the final value would be a not a number. Now what we want to do is we want to calculate the difference. We want to see is that a bigger or a lesser number. So the way that we can do that is with function mapping. So we'll come up to the very top and we're going to define a new function and we're going to call this um cuz that that'll be our label basically. So we're going to define create labels and then we've got cur val uh fute API something like that. Uh so actually we'll maybe make this cur API HPI. Uh and then the question is if the future if the future HPI is greater than the current HPI well that means we want to invest that was a good thing. So we'll say that's a one so we return a one. Else return a zero. So if the future API is a drop we're going to say that label is a zero. So every time HPI goes up we're going to take the current features and the label for those that those current features is a one. if it falls, we take the current features and we say the label for those features is zero and then we feed that into our machine learning classifier. So we have that information. Uh one problem that kind of can arise uh with this is as you can see in the uh graphs mostly HPI goes up over time. So you might have a really strong bias to predicting up every single time. So you'd want to look at that uh and take that into account. But even if you had that strong bias, if HPI really is going up all the time, then your strong bias should pay out, right? So anyway, now that's create labels. So then how do we map this kind of functionality to an entire column? Well, what you do is, and in fact, let's uh we don't need to print this right now. We'll come back to printing stuff. We'll say here housing data label um housing data label equals and then what we say is you want the list map and then in here you first you want to put the function that you want to run. We want to run create labels and then after this you put all the parameters. We have two parameters so we just put two parameters. If you had five parameters, you put in five parameters. Doesn't matter. So, create labels. And we have two parameters. We want the current HPI and the future HPI. Well, the current HPI is right here. So, the current HPI is housing data United States. Whoops. And then the next is future HPI. Well, we have that too. That is US future HPI or US HPI future. Whoops. this this value paste. Okay, so that's kind of a long thing, but basically you just do the list of the map of the function and all of the parameters after that and that's going to create us a new column. So now we can print housing data head. Uh yeah, that should be fine. So dropped that not available. So then you have your labels. So here we've got label one. That means it went up. Okay. So we can say I don't even see it. Oh, here it is. So from 40 to 49, right? Or well 0040 versus 0049. That's up. Good label one. Then obviously here we went from uh the current let's see this would be row three. So 5260, but then we went down to 5118. So that's a label zero, right? And you can see that's true from here as well. And then again we went from 5118 to 35 again that's a drop so zero and so on. So anyway our label is working. So that's how you can apply an entire function uh to a data frame. Uh and then since we're on the topic of these custom functions being applied. So that's a custom function to the entire data frame. And then what about that rolling apply we were talking about before. Uh we can do this pretty quickly. So let's just do it uh up at the top. Go ahead and from statistics import mean. Uh if you're in Python 3 and greater, you should be able to do that. In Python 2, you can't. So if you're following along in two, shame on you. Anyway, I'm getting good at making up these rhymes. Anyway, now we're going to define moving average. Now, of course, I know that we can do a moving average in pandas already, but this is just me kind of showing you the rolling apply in action. So moving average, and then moving average takes values. That's all we have to pass through. And then basically we return the mean of values. Simple as that. And really you wouldn't even need a function at all for this. You could actually rolling apply mean values. But we're trying to show that you can make your own function. So that's what we're doing. So then what you would do at the very end here is you could say something like this. So you could say housing data and then we'll just say uh ma apply example. We want to save this past this tutorial, but I'm just showing it. Equals pd.rolling apply. And we want to have the rolling apply uh be applied to what data? Well, we can do the um I think we have N30. So like we can do housing data and then capital M30 like that. So that's what we want to apply the data to. What is the window that we want? You want a 10 window. And since now normally that's all you need, right? With like a rolling mean for example, you pass the data that you want it to be applied to and then the window. But in this case, this is a rolling apply. We have to specify what function we wanted this rolling apply to be done with. So then after that, you say moving uh average and that's that. So now we can take this paste. Actually head's not going to work, right? Uh because it's moving. So anyway, let's do tail. And there you go. You've got your rolling apply worked. Okay. So anyways, uh long story short, that's mapping a function to the entire data frame, which is I find mapping a function is I use that way more than I don't think I've ever legitimately used a rolling apply other than to show it as an example. But I'm sure that's just me. I just haven't come across a reason for it, but I'm sure people have. Otherwise, it probably wouldn't exist in pandas. Um, but yeah, so, so those two things, mapping functions and rolling apply, basically opens the door for you to customize pretty much everything in pandas, like to get any sort of data manipulation or calculation to be run on your dataf frame done. And I think that's really cool. That shows some serious foresight on, you know, the side of pandas. So, anyway, that's pretty nifty. And a lot of times, you know, there's probably a way to make this happen without mapping a function. Okay, there probably is a way. I just don't know it. And there's going to be a lot of like little stuff like this one's probably pretty easy. You might be able to do a oneliner and have that be generated. Uh but sometimes you might have like pretty complex logic that you want to have mapped and it's just easy. You just make a function. It's just a lot easier on you. Anyway, uh that's it for this tutorial. In the next tutorial, now that we have labels, we can really quickly create features. Uh because we already know how to reference of a list of columns. Uh so from there, we got our features, we got our label, we can feed it through a machine learning classifier very fast in the next tutorial. So stay tuned for that. Questions, comments, suggestions, whatever, leave them below. Otherwise, as always, thanks for watching. Thanks for all the support, subscriptions. Till next time.
Original Description
In this data analysis with Python and Pandas tutorial, we cover function mapping and rolling_apply with Pandas.
The idea of function mapping and rolling apply is to allow you to fully customize Pandas to do whatever you need. If there isn't a pre-built method or function for you to run against to your dataframe to do analysis or manipulation, you can use function mapping, creating your own function entirely.
Sample code and text-based version of this tutorial: http://pythonprogramming.net/rolling-apply-mapping-functions-data-analysis-python-pandas-tutorial/
If you need to do something similar to this, but in a rolling fashion with a moving window, then you can do this with rolling_apply. Both will be covered here.
http://pythonprogramming.net
https://twitter.com/sentdex
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from sentdex · sentdex · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Matplotlib Python Tutorial Part 1: Basics and your first Graph!
sentdex
Python Encryption Tutorial with PyCrypto
sentdex
Python's Logging Function
sentdex
wxPython Tutorials 1: Making Windows GUIs with Python : Installing + 1st window!
sentdex
wxPython Tutorials 2: Making Windows GUIs with Python: Customizing Window Parameters
sentdex
wxPython Programming Tutorial 3: Menu Bar and Menu Button
sentdex
wxPython Programming Tutorial 4: Panels
sentdex
wxPython Programming Tutorial 5: User Input Saved To Variables
sentdex
wxPython Programming Tutorial 6: Multiple Choice Input
sentdex
wxPython Programming Tutorial 7: Adding Static Text and Colors
sentdex
wxPython Programming Tutorial 8: Custom Button Images
sentdex
wxPython Programming Tutorial 9: Tool Bar Items and Sub Menus!
sentdex
Basic PHP Tutorial 13: Multi-dimensional Array
sentdex
Basic PHP Tutorial 15: Functions and Global Variables
sentdex
Basic PHP Tutorial 12: Associative Array
sentdex
Basic PHP Tutorial 14: Foreach loop
sentdex
Basic PHP Tutorial 16: Include and Require
sentdex
Basic PHP Tutorial 7: Assignment, comparison and Logical operators
sentdex
Basic PHP Tutorial 4: Variables and Comments
sentdex
Basic PHP Tutorial 11: Arrays part 1, basic array
sentdex
Basic PHP Tutorial 6: If else and else if conditionals cont'd
sentdex
Basic PHP Tutorial 1: Intro to PHP
sentdex
Basic PHP Tutorial 3: HTML with PHP
sentdex
Basic PHP Tutorial 9: While Loop
sentdex
Basic PHP Tutorial 10: Switch Statement
sentdex
Basic PHP Tutorial 2: Print and Echo
sentdex
Basic PHP Tutorial 5: If else and else if conditional statements
sentdex
Basic PHP Tutorial 8: Arithmatic Operators: Doing math with php
sentdex
Basic PHP Tutorial 17: User Input Form Example / String Manipulation
sentdex
Basic PHP Tutorial 18: HTML Entities and forms cont'd
sentdex
Basic PHP Tutorial 19: Finding words in strings
sentdex
Basic PHP Programming Tutorial 20: Saving to a File / writing and appending
sentdex
Basic PHP Programming Tutorial 22: Hashing part 2: salting
sentdex
Basic PHP Programming Tutorial 23: Variables in Strings and tokenizing
sentdex
Basic PHP Programming Tutorial 21: MD5 Hashing For Security
sentdex
Basic PHP Programming Tutorial 24: String similarity
sentdex
Basic PHP Programming Tutorial 25: Time and Time stamps
sentdex
Basic PHP Programming Tutorial 26: Die and Exit
sentdex
Basic PHP Programming Tutorial 27: MySQL Databases Part 1
sentdex
Basic PHP Programming Tutorial 28: MySQL Database Part 2: Reading From Database
sentdex
Basic PHP Programming Tutorial 29: MySQL Database Part 3: Inputting Data
sentdex
Basic PHP Programming Tutorial 30: MySQL database in Use
sentdex
Django Tutorial Web Development with Python Part 1: Installing Django
sentdex
Python Tutorial: File Deletion and Folder Deletion / directory deletion
sentdex
Python Tutorial: How to Rename Files and Move Files with Python
sentdex
3D Graphs in Matplotlib for Python: Basic 3D Line
sentdex
3D Plotting in Matplotlib for Python: 3D Scatter Plot
sentdex
3D Charts in Matplotlib for Python: Multiple datasets scatter plot
sentdex
Sikuli Tutorial 1: Visually programming in python!
sentdex
Sikuli Tutorial 2: Program visually in python!
sentdex
Sikuli Tutorial 3: Program visually in python!
sentdex
3D Bar Charts in Python and Matplotlib
sentdex
3D Plane wire frame Graph Chart in Python
sentdex
Raspberry Pi Part 1 Introduction
sentdex
Raspberry Pi Part 8: First Download and Update! (Firmware)
sentdex
Raspberry Pi Part 10: How to set up a Linux Web Server on your Pi
sentdex
Raspberry Pi Part 11: Remote Desktop
sentdex
Twitter Analysis: How to rank a user's influence
sentdex
GPIO Tutorial for Pi Part 2 - Programming the GPIO
sentdex
GPIO Tutorial for Raspberry Pi Part 1 - Setting up
sentdex
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Built a Free AI-Powered YouTube SEO Toolkit With Zero Budget. Here’s What Actually Happened.
Medium · Startup
How to Create a Second Version of Yourself Inside Obsidian Using AI (Step-by-Step Guide)
Medium · ChatGPT
How to prepare for Spain civil service TIC exam using AI in 2026
Dev.to · David García
Going Viral! How I Created AI Kissing Videos Step by Step Easily Using AIAI.com
Medium · AI
🎓
Tutor Explanation
DeepCamp AI