How do I handle missing values in pandas?
Key Takeaways
The video covers handling missing values in pandas, including detecting, dropping, and filling missing values using methods such as isnull(), dropna(), and fillna(). The pandas library is utilized for data manipulation and analysis.
Full Transcript
hello and welcome back to my Q&A video series about the pandas library in Python and the question for today is how do I handle missing values in pandas okay so uh let's get an example data set so we can figure out what to do um we'll import pandas as PD as always and then our example data set for today is uh a data set of UFO reported sightings so UFO equals pd. read CSV and then uh bit. lefo reports okay so we'll read in that data into a data frame and this time let's take a look at the tail the last five rows okay so we see this value n n and what does that mean is that a string like it's a color or what is that it actually uh stands for not a number n an but what it means conceptually is a missing value now uh what does that mean well it means that when building the data frame uh read CSV detected what it thought was a missing value and it flagged it with this special Nan and you'll see why in a bit now let's take a look at the source file for a second this is on GitHub and uh here are the last five rows and for instance in this very last row we've got ebore comma comma oval and this is a CSV file so normally you'd have something between there but it's missing and if we compare that to the data frame we see eore Nan meaning missing and then oval okay so why is it missing I don't know I'd probably guess that there was a web form where people submit these UFO reports and color um where some of the fields are optional and some are required okay so uh let's take a look at some methods for working with missing values uh the first method I want to show you is called is null and it's a data frame method so we'll say ufo. is null. tail okay and the way is null works is it uh shows a false if something's not null if it's something's not missing uh and it shows true if it is missing so you'll see City for these last five rows it's not missing at all so it's false false false false false false for colors reported it's Mo it's true true true false true because uh these ones are missing now it's only because pandas uses this special value na that is null can detect them and produce trues and falses as a result now there's actually a uh inverse method called not null so ufo. not null. tail and you'll see that we just get the exact opposite now why are these methods useful is null and not null uh let me show you some examples so one is a really cool pandis trick which is to say ufo. isnull dosm and what that tells us is the number of missing values in each of the columns now how did that work I'm going to break it down for you because there's a couple things you have to understand so um let's create a pretend Panda Series so pd. series and I'll pass it a list of booleans so I'll say true false true okay so that's I've just created this series of three booleans okay and if I do do sum you're thinking how do you sum booleans well it turns out that when you do that if you ask pandas to do a mathematical operation with booleans it will convert TRS to one and falses to zero so it shows you the number of truths okay so that is hinting you towards why this worked because I'm summing booleans but you can tell that I'm summing them at the column level and why is that well if you think back to a previous video about the axis parameter um sum is doing axis equals 0 by default and if you remember how the axis Works uh for a math matical operation axis Z is saying I want you to do the operation in the direction of the zero axis the row axis so I want you to do a sum across the rows which means Down The Columns okay so in summary when we say ufo. is null. sum ufo. is null is creating this data frame of trues and falses and then the sum is doing column sums of the trues and falses converting the ones to the trues to one and the falses to zero okay so that's why that worked um this is something I do on basically every data frame uh for every data set I read in because I want to know where the missing values are okay now I want to show you another way to use uh this um isnull method and um isnull it turns out is also a series method so I can say like ufo. city.is null okay now I'm actually going to pass that series of booleans to this using this bracket notation we've seen many times I'm passing it a condition that will be used to filter the data frame and when I do that we know that there are only 2 five missing values in all of this in in all of this city series we can see the 25 rows where city is missing okay so that's how we examine a subset of the data frame by only looking at a uh a portion of the data frame okay all right so that's some basic functionality but I haven't answered the question what should you do about missing values and there's not one right or wrong answer to that I can't tell you what to do I can just give you some options and pandas for how to do it and you can decide okay so it really depends upon the data set and the analysis you're doing or problem you're trying to solve so one common thing to do is to drop missing values okay so let's see some options there I'm going to say ufo. shape just to uh see what is the shape of this data frame 18,24 one rows five columns now there's a data frame method called drop Na and if I say ufo. dropna how equals any. shape now this is the default so you don't have to put it but I want to emphasize it drop in a how equals any that means I want you to drop a row if any of its values are um missing okay so if I do that we lose almost all of the rows except for about 2,500 rows okay so this is all that it's left that all that's left if we drop a row rows in which any of the five columns have a missing value okay now I want to mention for a second that um there is an in place parameter here for drop Na and that is false by default which is why if I if I do ufo. shape right now it hasn't changed from the original this is just um because it's in place equals false it just temporarily drops it and then they come right back okay all right now let's see how we might modify drop na okay so now let's try ufo. dropna how equals all. shape and that says only drop a row if all of its values are missing and we knew that wasn't going to drop any rows because up here we saw that state and time don't have any missing values so how equals all is not going to make any changes um in this case Okay so one more option for drop na is called subset so ufo. dropna subset equals and I'm going to pass it a list of columns to consider so I'm going to say City and shape reported and I'm going to uh say how equals any okay so this is saying I want to drop uh a row if either City or shape reported are missing if any of those two are missing for a given row I want you to drop drop it I want you to drop the row okay and we're left with about 15,000 rows okay now you can change this again you can change this to how equals all and that means only drop a row if all of these columns have a missing value for that row and this time only four rows gets dropped because there were only four rows where both city is missing and shape reported is missing okay all right as always we're going to end with a bonus and we're going to talk about filling missing values and um so let's take a look at our our old good old friend value counts so UFO bracket uh shape reported do value counts and value counts is saying how many times did this value occur in in the shape reported Series so the most common was light we've got dis triangle Circle cigar rectangle egg lots of fun values in here okay so you might be thinking wait a minute that doesn't add up to 18,000 and you're right because by default missing values are excluded okay so we if we say drop na equals false we can see that shape reported has those 2644 missing values okay and we saw that uh up here those 2644 missing values Okay so what if I wanted to F fill those in maybe I thought and and what to do here depends a lot upon an understanding of your data but you might think well maybe I should fill in the Nan's with various that kind of makes sense for for it being missing so to do that I'm going to use another method called fill na so I'm going to say UFO shape reported fillna value equals various okay and I'm going to say in place equals true which means make the change in place okay in in the UFO data frame and I'll run that and if I rerun the value counts you'll see that various now has the 333 that were already there and then all of the missing values were converted to various okay so that's it for today uh thanks so much for joining me as always if you want to see more videos like this please click subscribe and then let me know in the comment section below if you have a tip for others or a comment or a question and that's it so I hope to see you again soon
Original Description
Most datasets contain "missing values", meaning that the data is incomplete. Deciding how to handle missing values can be challenging! In this video, I'll cover all of the basics: how missing values are represented in pandas, how to locate them, and options for how to drop them or fill them in.
SUBSCRIBE to learn data science with Python:
https://www.youtube.com/dataschool?sub_confirmation=1
JOIN the "Data School Insiders" community and receive exclusive rewards:
https://www.patreon.com/dataschool
== RESOURCES ==
GitHub repository for the series: https://github.com/justmarkham/pandas-videos
"read_csv" documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
"isnull" documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isnull.html
"notnull" documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.notnull.html
"dropna" documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html
"value_counts" documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html
"fillna" documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html
Working with missing data: http://pandas.pydata.org/pandas-docs/stable/missing_data.html
== LET'S CONNECT! ==
Newsletter: https://www.dataschool.io/subscribe/
Twitter: https://twitter.com/justmarkham
Facebook: https://www.facebook.com/DataScienceSchool/
LinkedIn: https://www.linkedin.com/in/justmarkham/
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Data School · Data School · 37 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
▶
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Setting up Git and GitHub
Data School
Navigating a GitHub Repository - Part 1
Data School
Forking a GitHub Repository
Data School
Creating a New GitHub Repository
Data School
Copying a GitHub Repository to Your Local Computer
Data School
Committing Changes in Git and Pushing to a GitHub Repository
Data School
Syncing Your GitHub Fork
Data School
Allstate Purchase Prediction Challenge on Kaggle
Data School
Troubleshooting: Updates Rejected When Pushing to GitHub
Data School
Hands-on dplyr tutorial for faster data manipulation in R
Data School
ROC Curves and Area Under the Curve (AUC) Explained
Data School
Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Data School
What is machine learning, and how does it work?
Data School
Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Data School
Getting started in scikit-learn with the famous iris dataset
Data School
Training a machine learning model with scikit-learn
Data School
Comparing machine learning models in scikit-learn
Data School
Data science in Python: pandas, seaborn, scikit-learn
Data School
Selecting the best model in scikit-learn using cross-validation
Data School
How to find the best model parameters in scikit-learn
Data School
How to evaluate a classifier in scikit-learn
Data School
What is pandas? (Introduction to the Q&A series)
Data School
How do I read a tabular data file into pandas?
Data School
How do I select a pandas Series from a DataFrame?
Data School
Why do some pandas commands end with parentheses (and others don't)?
Data School
How do I rename columns in a pandas DataFrame?
Data School
How do I remove columns from a pandas DataFrame?
Data School
How do I sort a pandas DataFrame or a Series?
Data School
How do I filter rows of a pandas DataFrame by column value?
Data School
How do I apply multiple filter criteria to a pandas DataFrame?
Data School
Your pandas questions answered!
Data School
How do I use the "axis" parameter in pandas?
Data School
How do I use string methods in pandas?
Data School
How do I change the data type of a pandas Series?
Data School
When should I use a "groupby" in pandas?
Data School
How do I explore a pandas Series?
Data School
How do I handle missing values in pandas?
Data School
What do I need to know about the pandas index? (Part 1)
Data School
What do I need to know about the pandas index? (Part 2)
Data School
How do I select multiple rows and columns from a pandas DataFrame?
Data School
Machine Learning with Text in scikit-learn (PyCon 2016)
Data School
When should I use the "inplace" parameter in pandas?
Data School
How do I make my pandas DataFrame smaller and faster?
Data School
How do I use pandas with scikit-learn to create Kaggle submissions?
Data School
More of your pandas questions answered!
Data School
How do I create dummy variables in pandas?
Data School
How do I work with dates and times in pandas?
Data School
How do I find and remove duplicate rows in pandas?
Data School
How do I avoid a SettingWithCopyWarning in pandas?
Data School
How do I change display options in pandas?
Data School
How do I create a pandas DataFrame from another object?
Data School
How do I apply a function to a pandas Series or DataFrame?
Data School
Getting started with machine learning in Python (webcast)
Data School
Q&A about Machine Learning with Text (online course)
Data School
Your pandas questions answered! (webcast)
Data School
Machine Learning with Text in scikit-learn (PyData DC 2016)
Data School
Write Pythonic Code for Better Data Science (webcast)
Data School
Web scraping in Python (Part 1): Getting started
Data School
Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Data School
Web scraping in Python (Part 3): Building a dataset
Data School
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
The AI Moat Paradox: The Better Models Become, the Less Models Matter
Medium · AI
170,927 AI Papers Reveal the Biggest Research Shifts of the First Half of 2026
Medium · Machine Learning
170,927 AI Papers Reveal the Biggest Research Shifts of the First Half of 2026
Medium · Data Science
[PoV] When Everyone Is Smart, No One Is
Medium · AI
🎓
Tutor Explanation
DeepCamp AI