How do I change the data type of a pandas Series?

Data School · Beginner ·📐 ML Fundamentals ·10y ago

Key Takeaways

The video demonstrates how to change the data type of a pandas Series, including converting integer columns to floating point and converting string columns to numeric types, using the `astype` method and the `dtype` parameter during CSV reading.

Full Transcript

hello and welcome back to my Q&A video series on the pandas library in Python and the question for today is how do I change the data type of a panda series all right let's just jump right in with an example data set so we're going to import pandas as PD and then the data set we're going to start with is uh alcohol consumption by country so drinks equals pd. CSV and then I'm going to use the bit.ly URL bit.ly SL uh drinks by country Okay so we've run that let's take a look at the head and we see six columns four of which are numeric and uh let's actually take a look at the data types of these columns and we use the D types attribute of the drinks data frame to find that out and what we see is that three of our columns are integer columns and uh we've got one floating Point column the total leaders column and then two columns which say type object which basically means string okay so country and continent are just strings now let's pretend for a second that we want to convert the beer servings column to floating Point okay rather than integer so all we have to do is we use the series method so we say drinks. beer servings dot as type float and that just means convert it to type floating point now if you want to actually modify the data frame you know you can either add a new column or I'm just going to overwrite an existing the existing beer servings column okay so we'll run that and if we check the d types now we'll see that it has changed and beer servings is now a floating Point column now you might be wondering what's the usefulness of that um and it's not the most useful example I admit uh you'll usually do this when for example you have a data file where the numbers are stored as strings so when you read it into pandas they're strings but you want to do math on them well to do math on a column it has to be a numeric type and you'd use something like this you'd name the column as type float okay so that's the more common use case for that all right uh I want to show you a follow-up question from the YouTube comments which is from Eli and let me show you that and Eli asks and we're going to look at number three here how to define the type of each column before actually reading the CSV okay great question thanks for that Eli um so what we're going to do is actually just change the data types during the CSV reading process okay so I'm going to copy and paste this line and all I have to do is add one more parameter dtype equals and I pass it a dictionary and the dictionary key is beer servings and the dictionary value is float and if I do this and check the d types you'll see that once again uh beer servings has been converted to floating point so the only difference between this and above is that this method does it during the reading process up here we con converted it after the data frame had already been created okay so uh going to show you one more data set and we'll do another example with that and uh this is the data set of orders from Chipotle so we'll say orders equals pd. readable and it's bit. Le chip orders okay and uh let's take a look look at it and the column I want to focus on is the item price column and you might be wondering is that a float or is that some special currency type or what is that and what you'll actually see is that pandas is storing that column as an object meaning a string because it doesn't really understand that these are numbers so if you wanted to do some math with it you're going to have to convert the type now we saw in a previous video how if we want to remove for instance a character from a series like this dollar sign you can use a string method so we'll say orders. item price. stir. replace and we'll say I want to replace the dollar sign with nothing okay now you might think that's enough but if you try to do like a a mathematical operation on the series it will give you an error because even though I removed the dollar sign this is the rest of it is still a string so I have to cast it to a float in order to uh do any math with it so when I cast it to a float I can now do a mathematical operation on it okay so as always we're going to end with a bonus and uh for the bonus I want to talk about the item name column okay so uh let's say orders. item name and uh we saw in a previous video about using the stir contains method to check for the presence of a substring chicken is what we're going to check for in this case and it returns a series of TRS and falses indicating whether or not a given substring was found in that column now we've got TRS and falses but you can imagine a case in which we need these as zeros and ones and for instance if you were building a machine learning model and this was one of your input features you'd need it to be numeric and you would use zeros and one to represent falses and true and that's actually super simple you can just say as type int and it has been converted to zeros and ones okay so that's it for today thank you so much for joining me feel free to subscribe if you'd like to see more videos like this please leave a question or a tip in the comments below and we can all help each other to learn but that's it for today so I hope to see you again soon

Original Description

Have you ever tried to do math with a pandas Series that you thought was numeric, but it turned out that your numbers were stored as strings? In this video, I'll demonstrate two different ways to change the data type of a Series so that you can fix incorrect data types. I'll also show you the easiest way to convert a boolean Series to integers, which is useful for creating dummy/indicator variables for machine learning. SUBSCRIBE to learn data science with Python: https://www.youtube.com/dataschool?sub_confirmation=1 JOIN the "Data School Insiders" community and receive exclusive rewards: https://www.patreon.com/dataschool == RESOURCES == GitHub repository for the series: https://github.com/justmarkham/pandas-videos "astype" documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.astype.html == LET'S CONNECT! == Newsletter: https://www.dataschool.io/subscribe/ Twitter: https://twitter.com/justmarkham Facebook: https://www.facebook.com/DataScienceSchool/ LinkedIn: https://www.linkedin.com/in/justmarkham/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data School · Data School · 34 of 60

1 Setting up Git and GitHub
Setting up Git and GitHub
Data School
2 Navigating a GitHub Repository - Part 1
Navigating a GitHub Repository - Part 1
Data School
3 Forking a GitHub Repository
Forking a GitHub Repository
Data School
4 Creating a New GitHub Repository
Creating a New GitHub Repository
Data School
5 Copying a GitHub Repository to Your Local Computer
Copying a GitHub Repository to Your Local Computer
Data School
6 Committing Changes in Git and Pushing to a GitHub Repository
Committing Changes in Git and Pushing to a GitHub Repository
Data School
7 Syncing Your GitHub Fork
Syncing Your GitHub Fork
Data School
8 Allstate Purchase Prediction Challenge on Kaggle
Allstate Purchase Prediction Challenge on Kaggle
Data School
9 Troubleshooting: Updates Rejected When Pushing to GitHub
Troubleshooting: Updates Rejected When Pushing to GitHub
Data School
10 Hands-on dplyr tutorial for faster data manipulation in R
Hands-on dplyr tutorial for faster data manipulation in R
Data School
11 ROC Curves and Area Under the Curve (AUC) Explained
ROC Curves and Area Under the Curve (AUC) Explained
Data School
12 Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Data School
13 What is machine learning, and how does it work?
What is machine learning, and how does it work?
Data School
14 Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Data School
15 Getting started in scikit-learn with the famous iris dataset
Getting started in scikit-learn with the famous iris dataset
Data School
16 Training a machine learning model with scikit-learn
Training a machine learning model with scikit-learn
Data School
17 Comparing machine learning models in scikit-learn
Comparing machine learning models in scikit-learn
Data School
18 Data science in Python: pandas, seaborn, scikit-learn
Data science in Python: pandas, seaborn, scikit-learn
Data School
19 Selecting the best model in scikit-learn using cross-validation
Selecting the best model in scikit-learn using cross-validation
Data School
20 How to find the best model parameters in scikit-learn
How to find the best model parameters in scikit-learn
Data School
21 How to evaluate a classifier in scikit-learn
How to evaluate a classifier in scikit-learn
Data School
22 What is pandas? (Introduction to the Q&A series)
What is pandas? (Introduction to the Q&A series)
Data School
23 How do I read a tabular data file into pandas?
How do I read a tabular data file into pandas?
Data School
24 How do I select a pandas Series from a DataFrame?
How do I select a pandas Series from a DataFrame?
Data School
25 Why do some pandas commands end with parentheses (and others don't)?
Why do some pandas commands end with parentheses (and others don't)?
Data School
26 How do I rename columns in a pandas DataFrame?
How do I rename columns in a pandas DataFrame?
Data School
27 How do I remove columns from a pandas DataFrame?
How do I remove columns from a pandas DataFrame?
Data School
28 How do I sort a pandas DataFrame or a Series?
How do I sort a pandas DataFrame or a Series?
Data School
29 How do I filter rows of a pandas DataFrame by column value?
How do I filter rows of a pandas DataFrame by column value?
Data School
30 How do I apply multiple filter criteria to a pandas DataFrame?
How do I apply multiple filter criteria to a pandas DataFrame?
Data School
31 Your pandas questions answered!
Your pandas questions answered!
Data School
32 How do I use the "axis" parameter in pandas?
How do I use the "axis" parameter in pandas?
Data School
33 How do I use string methods in pandas?
How do I use string methods in pandas?
Data School
How do I change the data type of a pandas Series?
How do I change the data type of a pandas Series?
Data School
35 When should I use a "groupby" in pandas?
When should I use a "groupby" in pandas?
Data School
36 How do I explore a pandas Series?
How do I explore a pandas Series?
Data School
37 How do I handle missing values in pandas?
How do I handle missing values in pandas?
Data School
38 What do I need to know about the pandas index? (Part 1)
What do I need to know about the pandas index? (Part 1)
Data School
39 What do I need to know about the pandas index? (Part 2)
What do I need to know about the pandas index? (Part 2)
Data School
40 How do I select multiple rows and columns from a pandas DataFrame?
How do I select multiple rows and columns from a pandas DataFrame?
Data School
41 Machine Learning with Text in scikit-learn (PyCon 2016)
Machine Learning with Text in scikit-learn (PyCon 2016)
Data School
42 When should I use the "inplace" parameter in pandas?
When should I use the "inplace" parameter in pandas?
Data School
43 How do I make my pandas DataFrame smaller and faster?
How do I make my pandas DataFrame smaller and faster?
Data School
44 How do I use pandas with scikit-learn to create Kaggle submissions?
How do I use pandas with scikit-learn to create Kaggle submissions?
Data School
45 More of your pandas questions answered!
More of your pandas questions answered!
Data School
46 How do I create dummy variables in pandas?
How do I create dummy variables in pandas?
Data School
47 How do I work with dates and times in pandas?
How do I work with dates and times in pandas?
Data School
48 How do I find and remove duplicate rows in pandas?
How do I find and remove duplicate rows in pandas?
Data School
49 How do I avoid a SettingWithCopyWarning in pandas?
How do I avoid a SettingWithCopyWarning in pandas?
Data School
50 How do I change display options in pandas?
How do I change display options in pandas?
Data School
51 How do I create a pandas DataFrame from another object?
How do I create a pandas DataFrame from another object?
Data School
52 How do I apply a function to a pandas Series or DataFrame?
How do I apply a function to a pandas Series or DataFrame?
Data School
53 Getting started with machine learning in Python (webcast)
Getting started with machine learning in Python (webcast)
Data School
54 Q&A about Machine Learning with Text (online course)
Q&A about Machine Learning with Text (online course)
Data School
55 Your pandas questions answered! (webcast)
Your pandas questions answered! (webcast)
Data School
56 Machine Learning with Text in scikit-learn (PyData DC 2016)
Machine Learning with Text in scikit-learn (PyData DC 2016)
Data School
57 Write Pythonic Code for Better Data Science (webcast)
Write Pythonic Code for Better Data Science (webcast)
Data School
58 Web scraping in Python (Part 1): Getting started
Web scraping in Python (Part 1): Getting started
Data School
59 Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Data School
60 Web scraping in Python (Part 3): Building a dataset
Web scraping in Python (Part 3): Building a dataset
Data School

This video teaches how to change the data type of a pandas Series, including converting integer columns to floating point and converting string columns to numeric types, which is essential for data cleaning and preprocessing in machine learning. The video provides hands-on examples and demonstrations of how to use the `astype` method and the `dtype` parameter during CSV reading.

Key Takeaways
  1. Import pandas library
  2. Create a sample dataset
  3. Check the data types of the columns
  4. Use the `astype` method to convert the data type of a column
  5. Use the `dtype` parameter during CSV reading to convert the data type of a column
  6. Clean and preprocess the data for machine learning
💡 The `astype` method and the `dtype` parameter during CSV reading can be used to convert the data type of a pandas Series, which is essential for data cleaning and preprocessing in machine learning.

Related Reads

📰
Why Choosing the Right Machine Learning Development Company Matters More Than the AI Model
Choosing the right machine learning development company is crucial for turning AI investments into measurable results, as it can make or break the success of AI projects
Medium · Machine Learning
📰
Data privacy in AI training: federated learning, differential privacy, and synthetic data
Learn how federated learning, differential privacy, and synthetic data preserve data privacy in AI training, and why they matter for secure machine learning
Dev.to AI
📰
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Learn to preprocess data by encoding and scaling features for better machine learning model performance
Medium · Machine Learning
📰
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Learn to preprocess data for machine learning by encoding and scaling features, a crucial step for model training
Medium · Data Science
Up next
Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub
FAME WORLD EDUCATIONAL HUB
Watch →