How do I change the data type of a pandas Series?
Key Takeaways
The video demonstrates how to change the data type of a pandas Series, including converting integer columns to floating point and converting string columns to numeric types, using the `astype` method and the `dtype` parameter during CSV reading.
Full Transcript
hello and welcome back to my Q&A video series on the pandas library in Python and the question for today is how do I change the data type of a panda series all right let's just jump right in with an example data set so we're going to import pandas as PD and then the data set we're going to start with is uh alcohol consumption by country so drinks equals pd. CSV and then I'm going to use the bit.ly URL bit.ly SL uh drinks by country Okay so we've run that let's take a look at the head and we see six columns four of which are numeric and uh let's actually take a look at the data types of these columns and we use the D types attribute of the drinks data frame to find that out and what we see is that three of our columns are integer columns and uh we've got one floating Point column the total leaders column and then two columns which say type object which basically means string okay so country and continent are just strings now let's pretend for a second that we want to convert the beer servings column to floating Point okay rather than integer so all we have to do is we use the series method so we say drinks. beer servings dot as type float and that just means convert it to type floating point now if you want to actually modify the data frame you know you can either add a new column or I'm just going to overwrite an existing the existing beer servings column okay so we'll run that and if we check the d types now we'll see that it has changed and beer servings is now a floating Point column now you might be wondering what's the usefulness of that um and it's not the most useful example I admit uh you'll usually do this when for example you have a data file where the numbers are stored as strings so when you read it into pandas they're strings but you want to do math on them well to do math on a column it has to be a numeric type and you'd use something like this you'd name the column as type float okay so that's the more common use case for that all right uh I want to show you a follow-up question from the YouTube comments which is from Eli and let me show you that and Eli asks and we're going to look at number three here how to define the type of each column before actually reading the CSV okay great question thanks for that Eli um so what we're going to do is actually just change the data types during the CSV reading process okay so I'm going to copy and paste this line and all I have to do is add one more parameter dtype equals and I pass it a dictionary and the dictionary key is beer servings and the dictionary value is float and if I do this and check the d types you'll see that once again uh beer servings has been converted to floating point so the only difference between this and above is that this method does it during the reading process up here we con converted it after the data frame had already been created okay so uh going to show you one more data set and we'll do another example with that and uh this is the data set of orders from Chipotle so we'll say orders equals pd. readable and it's bit. Le chip orders okay and uh let's take a look look at it and the column I want to focus on is the item price column and you might be wondering is that a float or is that some special currency type or what is that and what you'll actually see is that pandas is storing that column as an object meaning a string because it doesn't really understand that these are numbers so if you wanted to do some math with it you're going to have to convert the type now we saw in a previous video how if we want to remove for instance a character from a series like this dollar sign you can use a string method so we'll say orders. item price. stir. replace and we'll say I want to replace the dollar sign with nothing okay now you might think that's enough but if you try to do like a a mathematical operation on the series it will give you an error because even though I removed the dollar sign this is the rest of it is still a string so I have to cast it to a float in order to uh do any math with it so when I cast it to a float I can now do a mathematical operation on it okay so as always we're going to end with a bonus and uh for the bonus I want to talk about the item name column okay so uh let's say orders. item name and uh we saw in a previous video about using the stir contains method to check for the presence of a substring chicken is what we're going to check for in this case and it returns a series of TRS and falses indicating whether or not a given substring was found in that column now we've got TRS and falses but you can imagine a case in which we need these as zeros and ones and for instance if you were building a machine learning model and this was one of your input features you'd need it to be numeric and you would use zeros and one to represent falses and true and that's actually super simple you can just say as type int and it has been converted to zeros and ones okay so that's it for today thank you so much for joining me feel free to subscribe if you'd like to see more videos like this please leave a question or a tip in the comments below and we can all help each other to learn but that's it for today so I hope to see you again soon
Original Description
Have you ever tried to do math with a pandas Series that you thought was numeric, but it turned out that your numbers were stored as strings? In this video, I'll demonstrate two different ways to change the data type of a Series so that you can fix incorrect data types. I'll also show you the easiest way to convert a boolean Series to integers, which is useful for creating dummy/indicator variables for machine learning.
SUBSCRIBE to learn data science with Python:
https://www.youtube.com/dataschool?sub_confirmation=1
JOIN the "Data School Insiders" community and receive exclusive rewards:
https://www.patreon.com/dataschool
== RESOURCES ==
GitHub repository for the series: https://github.com/justmarkham/pandas-videos
"astype" documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.astype.html
== LET'S CONNECT! ==
Newsletter: https://www.dataschool.io/subscribe/
Twitter: https://twitter.com/justmarkham
Facebook: https://www.facebook.com/DataScienceSchool/
LinkedIn: https://www.linkedin.com/in/justmarkham/
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Data School · Data School · 34 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
▶
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Setting up Git and GitHub
Data School
Navigating a GitHub Repository - Part 1
Data School
Forking a GitHub Repository
Data School
Creating a New GitHub Repository
Data School
Copying a GitHub Repository to Your Local Computer
Data School
Committing Changes in Git and Pushing to a GitHub Repository
Data School
Syncing Your GitHub Fork
Data School
Allstate Purchase Prediction Challenge on Kaggle
Data School
Troubleshooting: Updates Rejected When Pushing to GitHub
Data School
Hands-on dplyr tutorial for faster data manipulation in R
Data School
ROC Curves and Area Under the Curve (AUC) Explained
Data School
Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Data School
What is machine learning, and how does it work?
Data School
Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Data School
Getting started in scikit-learn with the famous iris dataset
Data School
Training a machine learning model with scikit-learn
Data School
Comparing machine learning models in scikit-learn
Data School
Data science in Python: pandas, seaborn, scikit-learn
Data School
Selecting the best model in scikit-learn using cross-validation
Data School
How to find the best model parameters in scikit-learn
Data School
How to evaluate a classifier in scikit-learn
Data School
What is pandas? (Introduction to the Q&A series)
Data School
How do I read a tabular data file into pandas?
Data School
How do I select a pandas Series from a DataFrame?
Data School
Why do some pandas commands end with parentheses (and others don't)?
Data School
How do I rename columns in a pandas DataFrame?
Data School
How do I remove columns from a pandas DataFrame?
Data School
How do I sort a pandas DataFrame or a Series?
Data School
How do I filter rows of a pandas DataFrame by column value?
Data School
How do I apply multiple filter criteria to a pandas DataFrame?
Data School
Your pandas questions answered!
Data School
How do I use the "axis" parameter in pandas?
Data School
How do I use string methods in pandas?
Data School
How do I change the data type of a pandas Series?
Data School
When should I use a "groupby" in pandas?
Data School
How do I explore a pandas Series?
Data School
How do I handle missing values in pandas?
Data School
What do I need to know about the pandas index? (Part 1)
Data School
What do I need to know about the pandas index? (Part 2)
Data School
How do I select multiple rows and columns from a pandas DataFrame?
Data School
Machine Learning with Text in scikit-learn (PyCon 2016)
Data School
When should I use the "inplace" parameter in pandas?
Data School
How do I make my pandas DataFrame smaller and faster?
Data School
How do I use pandas with scikit-learn to create Kaggle submissions?
Data School
More of your pandas questions answered!
Data School
How do I create dummy variables in pandas?
Data School
How do I work with dates and times in pandas?
Data School
How do I find and remove duplicate rows in pandas?
Data School
How do I avoid a SettingWithCopyWarning in pandas?
Data School
How do I change display options in pandas?
Data School
How do I create a pandas DataFrame from another object?
Data School
How do I apply a function to a pandas Series or DataFrame?
Data School
Getting started with machine learning in Python (webcast)
Data School
Q&A about Machine Learning with Text (online course)
Data School
Your pandas questions answered! (webcast)
Data School
Machine Learning with Text in scikit-learn (PyData DC 2016)
Data School
Write Pythonic Code for Better Data Science (webcast)
Data School
Web scraping in Python (Part 1): Getting started
Data School
Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Data School
Web scraping in Python (Part 3): Building a dataset
Data School
More on: Data Literacy
View skill →Related Reads
📰
📰
📰
📰
Why Choosing the Right Machine Learning Development Company Matters More Than the AI Model
Medium · Machine Learning
Data privacy in AI training: federated learning, differential privacy, and synthetic data
Dev.to AI
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Medium · Machine Learning
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Medium · Data Science
🎓
Tutor Explanation
DeepCamp AI