What do I need to know about the pandas index? (Part 2)

Data School · Beginner ·🛡️ AI Safety & Ethics ·10y ago

Key Takeaways

The video discusses the pandas index, specifically the Series index, and how it enables alignment during mathematical operations and concatenation. It covers index-based selection, sorting, and automatic index alignment.

Full Transcript

hello and welcome back to my Q&A video series on the pandas library in Python and today's video is actually a continuation from last time when we were talking about the pandas index So today we're going to talk about the series index and as well talk about alignment so if you haven't watched that first video I would go ahead and watch that now uh and then come on back here okay so we are going to start with the same data set we used last time so we'll import pandas as PD and then uh we're going to use this data set of uh alcohol consumption by continent or Sorry by country so we're going to read CSV and it's bit. Le SL drinks by country okay and we'll take a look at the head and uh just like last time we saw in the that a data frame always has an index right here okay and what I want to show you is that a uh series also has an index okay and it comes from the data frame so if for example I select the continent series and I say do head you will see 0 1 2 3 4 which is the index for the series which came from the data frame so the index is on the left and the values are on the right now let's pretend that uh we didn't use the default index for the data frame and instead we set something else as the index of the data frame so let's say for example uh drinks. set index and we're going to set the index as the country uh in place equals true and let's take another another look at it and we see that the country has been turned into the index okay so now what's going to happen when we select the continent series okay so drinks. continent. head and this time we're seeing the same thing as last time okay the index is on the left the values are on the right this is just a panda series the real contents is this but the index just came from the data frame and is attached to each row okay now you've actually seen series um many times before in these videos and uh they all include an index and you probably just didn't notice it or notice what it was so for example if I say drinks continent. value counts this is actually a series and as such uh and let's just copy and paste this uh as such it has for example an index and here it is and it has values okay now because it's a series that is output and not some special value counts object or something we can uh actually use the index to select values from the series what I mean is uh I can take this and I can refer to an index such as uh let's say Africa and I put it in Brackets and I'm saying from this series find the index Africa and and show me the value okay this is kind of like how we used Loke uh with the data frame to say oh I want to pull out the contents by referring to the index and the column name in that case here I just say what's the index I'm looking for it's Africa and it shows me the value and again this worked because this is a series object okay now next I want to talk about sorting okay so uh again pasting that and do you remember how you can sort the values of a series uh we saw that in a previous video and you just say do sort values and that sorts in ascending order so now you can see that the series is sorted okay but what if I want to sort the index itself say in a sense sending order I can actually use something called sort index oops and there you go it's now in ascending alphabetical order okay so in the last video we talked about three reasons the index exists and it was identification selection and alignment and so I want to show you what alignment means um and for that I need to create another series okay so I'm going to call it people and I'm going to say pd. series and just follow along with me here I'm going to pass in uh 3 million and a list of the numbers 3 million 85,000 okay and I'm going to pass I'm going to say index equals and pass a list uh Albania and Andora and uh I'm going to say name equals population okay so this is one way to create a panda series and let's just look at it uh so the population of Albania is approximately 3 million population of Andor is about 85,000 and this is how I constructed the series I gave it the values I gave it the index and I gave it a name now this is my tiny data set um and let's say I want to use this plus the drinks data set to calculate the total beer servings for each country and I'm going to do that by multiplying these numbers so the number of people times the data in the beer serving series which tells you the average per person so I'm going to take take the average per person multiply it by the number of people okay to see the total beer servings per year in that country so all I'm going to do is say drinks. beer servings which by the way I'll just remind you of what that looks like here's the drinks. beer servings I'm going to say that times people and when I run that check out what happens okay obviously for uh countries that were not represented in this series uh it's going to say well I can't do that math because I need I need the number of people so it marks it with not a number which is means a missing value however for the ones in which it does have the population amount it does the multiplication and here's the thing it aligned them by the series it didn't just take these two numbers and multiply them by the first two rows it found Albania 3 million and multiplied 3 million by the beer servings number for Albania Andor did the same thing okay so that's uh in summary alignment allows us to put data together and work with it together even if it does n uh if it's not exactly the same length as long as you tell it which rows correspond to which other rows so I took the Albania data here and the andur data here and it knew to do the math because they shared based upon this shared index okay so uh I want to uh end today with a bonus as always and uh the the question I would pose is how if I just wanted to take this People series and add it to the data frame how would I do that and I'm going to use a method called pd. concat short for concatenation now concat can be used to concatenate rows uh on top of other rows or columns next to other columns and the way we control that is with the a parameter so um uh I'm going to say uh I'm going to pass it the I'm going to pass it a list with the drinks object which is the data frame and the People series and I'll say axis equals 1 which means essentially put these objects side by side and when we do that and say. head check that out we've got a new column and even though it didn't have complete data it put the data in the right spots because of the series index and that is the beauty of the automatic alignment that pandas does using the index okay so that's it for today please click subscribe if you'd like to see more videos like this I'll be talking about the multi-index uh which I know some of you want to hear about in a future video uh let me know in the qu in the comment section below if you have any questions and that's it so um thanks again for joining me and I hope to see you soon

Original Description

In part two of our discussion of the index, we'll switch our focus from the DataFrame index to the Series index. After discussing index-based selection and sorting, I'll demonstrate how automatic index alignment during mathematical operations and concatenation enables us to easily work with incomplete data in pandas. SUBSCRIBE to learn data science with Python: https://www.youtube.com/dataschool?sub_confirmation=1 JOIN the "Data School Insiders" community and receive exclusive rewards: https://www.patreon.com/dataschool == RESOURCES == GitHub repository for the series: https://github.com/justmarkham/pandas-videos "set_index" documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.set_index.html "value_counts" documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html "sort_values" documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sort_values.html "sort_index" documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.sort_index.html "Series" documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.html "concat" documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.concat.html Indexing and selecting data: http://pandas.pydata.org/pandas-docs/stable/indexing.html == LET'S CONNECT! == Newsletter: https://www.dataschool.io/subscribe/ Twitter: https://twitter.com/justmarkham Facebook: https://www.facebook.com/DataScienceSchool/ LinkedIn: https://www.linkedin.com/in/justmarkham/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data School · Data School · 39 of 60

1 Setting up Git and GitHub
Setting up Git and GitHub
Data School
2 Navigating a GitHub Repository - Part 1
Navigating a GitHub Repository - Part 1
Data School
3 Forking a GitHub Repository
Forking a GitHub Repository
Data School
4 Creating a New GitHub Repository
Creating a New GitHub Repository
Data School
5 Copying a GitHub Repository to Your Local Computer
Copying a GitHub Repository to Your Local Computer
Data School
6 Committing Changes in Git and Pushing to a GitHub Repository
Committing Changes in Git and Pushing to a GitHub Repository
Data School
7 Syncing Your GitHub Fork
Syncing Your GitHub Fork
Data School
8 Allstate Purchase Prediction Challenge on Kaggle
Allstate Purchase Prediction Challenge on Kaggle
Data School
9 Troubleshooting: Updates Rejected When Pushing to GitHub
Troubleshooting: Updates Rejected When Pushing to GitHub
Data School
10 Hands-on dplyr tutorial for faster data manipulation in R
Hands-on dplyr tutorial for faster data manipulation in R
Data School
11 ROC Curves and Area Under the Curve (AUC) Explained
ROC Curves and Area Under the Curve (AUC) Explained
Data School
12 Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Data School
13 What is machine learning, and how does it work?
What is machine learning, and how does it work?
Data School
14 Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Data School
15 Getting started in scikit-learn with the famous iris dataset
Getting started in scikit-learn with the famous iris dataset
Data School
16 Training a machine learning model with scikit-learn
Training a machine learning model with scikit-learn
Data School
17 Comparing machine learning models in scikit-learn
Comparing machine learning models in scikit-learn
Data School
18 Data science in Python: pandas, seaborn, scikit-learn
Data science in Python: pandas, seaborn, scikit-learn
Data School
19 Selecting the best model in scikit-learn using cross-validation
Selecting the best model in scikit-learn using cross-validation
Data School
20 How to find the best model parameters in scikit-learn
How to find the best model parameters in scikit-learn
Data School
21 How to evaluate a classifier in scikit-learn
How to evaluate a classifier in scikit-learn
Data School
22 What is pandas? (Introduction to the Q&A series)
What is pandas? (Introduction to the Q&A series)
Data School
23 How do I read a tabular data file into pandas?
How do I read a tabular data file into pandas?
Data School
24 How do I select a pandas Series from a DataFrame?
How do I select a pandas Series from a DataFrame?
Data School
25 Why do some pandas commands end with parentheses (and others don't)?
Why do some pandas commands end with parentheses (and others don't)?
Data School
26 How do I rename columns in a pandas DataFrame?
How do I rename columns in a pandas DataFrame?
Data School
27 How do I remove columns from a pandas DataFrame?
How do I remove columns from a pandas DataFrame?
Data School
28 How do I sort a pandas DataFrame or a Series?
How do I sort a pandas DataFrame or a Series?
Data School
29 How do I filter rows of a pandas DataFrame by column value?
How do I filter rows of a pandas DataFrame by column value?
Data School
30 How do I apply multiple filter criteria to a pandas DataFrame?
How do I apply multiple filter criteria to a pandas DataFrame?
Data School
31 Your pandas questions answered!
Your pandas questions answered!
Data School
32 How do I use the "axis" parameter in pandas?
How do I use the "axis" parameter in pandas?
Data School
33 How do I use string methods in pandas?
How do I use string methods in pandas?
Data School
34 How do I change the data type of a pandas Series?
How do I change the data type of a pandas Series?
Data School
35 When should I use a "groupby" in pandas?
When should I use a "groupby" in pandas?
Data School
36 How do I explore a pandas Series?
How do I explore a pandas Series?
Data School
37 How do I handle missing values in pandas?
How do I handle missing values in pandas?
Data School
38 What do I need to know about the pandas index? (Part 1)
What do I need to know about the pandas index? (Part 1)
Data School
What do I need to know about the pandas index? (Part 2)
What do I need to know about the pandas index? (Part 2)
Data School
40 How do I select multiple rows and columns from a pandas DataFrame?
How do I select multiple rows and columns from a pandas DataFrame?
Data School
41 Machine Learning with Text in scikit-learn (PyCon 2016)
Machine Learning with Text in scikit-learn (PyCon 2016)
Data School
42 When should I use the "inplace" parameter in pandas?
When should I use the "inplace" parameter in pandas?
Data School
43 How do I make my pandas DataFrame smaller and faster?
How do I make my pandas DataFrame smaller and faster?
Data School
44 How do I use pandas with scikit-learn to create Kaggle submissions?
How do I use pandas with scikit-learn to create Kaggle submissions?
Data School
45 More of your pandas questions answered!
More of your pandas questions answered!
Data School
46 How do I create dummy variables in pandas?
How do I create dummy variables in pandas?
Data School
47 How do I work with dates and times in pandas?
How do I work with dates and times in pandas?
Data School
48 How do I find and remove duplicate rows in pandas?
How do I find and remove duplicate rows in pandas?
Data School
49 How do I avoid a SettingWithCopyWarning in pandas?
How do I avoid a SettingWithCopyWarning in pandas?
Data School
50 How do I change display options in pandas?
How do I change display options in pandas?
Data School
51 How do I create a pandas DataFrame from another object?
How do I create a pandas DataFrame from another object?
Data School
52 How do I apply a function to a pandas Series or DataFrame?
How do I apply a function to a pandas Series or DataFrame?
Data School
53 Getting started with machine learning in Python (webcast)
Getting started with machine learning in Python (webcast)
Data School
54 Q&A about Machine Learning with Text (online course)
Q&A about Machine Learning with Text (online course)
Data School
55 Your pandas questions answered! (webcast)
Your pandas questions answered! (webcast)
Data School
56 Machine Learning with Text in scikit-learn (PyData DC 2016)
Machine Learning with Text in scikit-learn (PyData DC 2016)
Data School
57 Write Pythonic Code for Better Data Science (webcast)
Write Pythonic Code for Better Data Science (webcast)
Data School
58 Web scraping in Python (Part 1): Getting started
Web scraping in Python (Part 1): Getting started
Data School
59 Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Data School
60 Web scraping in Python (Part 3): Building a dataset
Web scraping in Python (Part 3): Building a dataset
Data School

This video teaches how to work with the pandas Series index, including index-based selection, sorting, and automatic index alignment during mathematical operations and concatenation.

Key Takeaways
  1. Import pandas library
  2. Create a Series object
  3. Understand index-based selection
  4. Sort a Series object
  5. Perform mathematical operations using index alignment
  6. Concatenate a Series object with a DataFrame
💡 The pandas index enables automatic alignment during mathematical operations and concatenation, making it easier to work with incomplete data.

Related AI Lessons

What 116 court judgments taught me about the limits of AI
Learn about the limitations of AI in professional settings through an analysis of 116 court judgments and a personal project using consumer AI tools
Medium · AI
Your ChatGPT History Is a Liability. I Fixed That With a $80 Chip and a Pi5.
Protect your ChatGPT history from being used as evidence with a simple hardware solution using a $80 chip and a Pi5
Medium · AI
Your Skepticism About AI Is an Asset. Here’s How to Use It.
Learn to leverage skepticism about AI to improve its adoption and implementation in your team and organization, and why it matters for responsible AI development
Medium · Programming
The Dark Side of AI: What We Lose When We Stop Thinking
Discover how AI's benefits come with a cost to human critical thinking skills, and why it matters for professionals to be aware of this trade-off
Medium · AI
Up next
How Finance Professionals Can Use AI Safely
The Cutting Edge School
Watch →