How do I select a pandas Series from a DataFrame?
Key Takeaways
This video demonstrates how to select a pandas Series from a DataFrame using bracket notation and dot notation, with a focus on handling column names with spaces or conflicts with built-in methods.
Full Transcript
hello everyone and welcome back to my Q&A video series on the pandas library in Python and the question for today is how do I select a pandas series from a data frame so there are two basic object types in pandas that hold data and one is called the data frame and it's basically just a table of rows and columns and then each of those columns is known as a pandas series okay so you can have a panda series that's not part of a data frame but in my work I find that mostly I'm working with series that are part of a data frame so the question is how do I select a series from it perhaps I want to do an analysis on a particular series or perhaps I want to manipulate that series that's why you might want to do it so let's go ahead and create an example data frame and then answer this question so first we're going to as always import pandas as PD and then on the data set we're going to use today is a data set of UFO reports reports of UFO sightings from 1930 through 2000 okay so I'm going to say UFO equals PD dot and there's two options here we're going to start with read table okay we're going to pass it the URL where the data file is stored and as always I'm using a bitly URL so that you can follow along so bitly slash UFO reports okay and then I'll say SEP comma now the file at that location is a CSV file a comma separated value file read table assumes tab separated files by default so we're going to say this up is calm now I could do that and that would work run it and it will work but the shortcut is actually to use read CSV and the literally the only difference between read table and read CSV is that read CSV uses the comma as the default separator so let's run that okay so we want to confirm this is a data frame so we'll just say type UFO and it is indeed a data frame type is just a built-in Python method for looking at the type of object and if we do dot head we can see the first five rows okay so it's the city the state the color and shape reported and the time of the UFO sighting okay so to actually answer our question which is how do we select a series let's go ahead and do that and the basic way to do this and we'll see more is to use what's known as bracket notation so this is kind of like selecting well I'll just show you first UFO bracket city and that will select out the city series okay and so this is just city this is the first 30 rows and the last 30 rows of that series okay so um that's the that's the basic method and let's just confirm this is a series we'll just say type and we see it is indeed a series okay so couple notes about this seems pretty simple but a couple notes that might be useful to remember so um this bracket notation this is case sensitive so if I just put on a lower case city that'll generate an error it'll generate a key error okay um there is a shortcut to this that is very common and useful to remember okay so um you can actually instead use what's known as dot notte chin and it looks like this UFO dot city and that does work now pandas is using a little trick here in which um every time a series is added to a data frame okay it all its name automatically becomes an attribute of that data frame okay so if I say UFO dot and I hit tab you can see right there is city along with all of these other methods and attributes I just hit the tab character after anything dot it literally makes the the column name the series name one of the attributes which is very handy now why would you want to do this well just because it's less typing to say UFO dot city then this much longer you know brackets quotation marks city so UFO dot city is quite quite handy okay so a question that might have come up in your mind is how would I select the series colors reported well would I do this that doesn't seem quite right because there's a space and house what's Python going to think that will air you could say well maybe I just delete the space and no that doesn't work either you can try lots of different variations but if the column name is colors space reported dot notation simply does not work okay if you want to select out colors reported you have to use bracket notation okay alright so I know that might look strange these are not a number we'll talk about that in a future video but um you have to use bracket notation to select a series if the series has a space in it this is also the case if you name a column something that conflicts with a built-in method or attribute okay so this is a tricky one ah let's pretend you had a column named shape okay lowercase H a sha pe if you tried to use dot notation to select that column it would fail because shape is already a built-in attribute of a data frame so um that would conflict that would not work you would just get back you know you would get back the shape of the data frame you would not get back the pandas series named shape if there was one the same thing would hold if you named a column head and then tried to access it using dot notation you would also not get what you expected okay so the bottom line here is that dot notation only works under certain circumstances whereas bracket notation will always work so if you like bracket notation just use it just know that there are people out there like me who like to use dot notation because it's faster to type so you want to recognize it um but it will not dock notation will not always work for selecting a pandas series okay so I'll always end with a bonus tip and the bonus tip for today is how do I create a new pandas series in a data frame so quick diversion on that um so in a regular Python if you have two strings like a B I'm sorry a B which is a string and C D which is a string and you use the plus sign that combines them that concatenates them okay you can do the same thing with pandas series that are strings okay you could do the same thing with pandas series that are numbers and it would do them but we've all of our series here are strings um so if we said UFO dot city + UFO dot state I think what's going to happen it will actually just put the city next to the state so you can see ethica and new york Willingboro New Jersey Holyoke Colorado okay now that looks kind of funny um maybe we want to actually put in a like a comma space if you're familiar with that American Way maybe worldwide way of listing out cities and states or city and country and now we get the city comma space state okay now perhaps we're doing this because we want to create a new column that includes both of those you might think well maybe I just assign it so UFO dot and we'll call it location equals okay this is what you might think this is not the right way this is not going to do what you think okay you have to use bracket notation when creating a new series in a data frame so that will work and you will see info dot head you will see now we've got this location column that is the city and the state okay so if you're going to create a new column in a data frame you should you have to use bracket notation when naming that column you can use either notation on the right side of the equal sign but when assigning it you have to use bracket notation I could not have said UFO dot location here okay so that's it thanks so much for joining me for this video feel free to click Subscribe if you'd like to see more videos like this as always put a comment below if you have a question about the video or a question about pandas in general maybe I'll make a video about it or I will at least answer your question in the comments okay thanks again and I will see you soon
Original Description
DataFrames and Series are the two main object types in pandas for data storage: a DataFrame is like a table, and each column of the table is called a Series. You will often select a Series in order to analyze or manipulate it. In this video, I'll show you how to select a Series using "bracket notation" and "dot notation", and will discuss the limitations of dot notation. I'll also demonstrate how to create a new Series in a DataFrame.
SUBSCRIBE to learn data science with Python:
https://www.youtube.com/dataschool?sub_confirmation=1
JOIN the "Data School Insiders" community and receive exclusive rewards:
https://www.patreon.com/dataschool
== RESOURCES ==
GitHub repository for the series: https://github.com/justmarkham/pandas-videos
== LET'S CONNECT! ==
Newsletter: https://www.dataschool.io/subscribe/
Twitter: https://twitter.com/justmarkham
Facebook: https://www.facebook.com/DataScienceSchool/
LinkedIn: https://www.linkedin.com/in/justmarkham/
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Data School · Data School · 24 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
▶
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Setting up Git and GitHub
Data School
Navigating a GitHub Repository - Part 1
Data School
Forking a GitHub Repository
Data School
Creating a New GitHub Repository
Data School
Copying a GitHub Repository to Your Local Computer
Data School
Committing Changes in Git and Pushing to a GitHub Repository
Data School
Syncing Your GitHub Fork
Data School
Allstate Purchase Prediction Challenge on Kaggle
Data School
Troubleshooting: Updates Rejected When Pushing to GitHub
Data School
Hands-on dplyr tutorial for faster data manipulation in R
Data School
ROC Curves and Area Under the Curve (AUC) Explained
Data School
Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Data School
What is machine learning, and how does it work?
Data School
Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Data School
Getting started in scikit-learn with the famous iris dataset
Data School
Training a machine learning model with scikit-learn
Data School
Comparing machine learning models in scikit-learn
Data School
Data science in Python: pandas, seaborn, scikit-learn
Data School
Selecting the best model in scikit-learn using cross-validation
Data School
How to find the best model parameters in scikit-learn
Data School
How to evaluate a classifier in scikit-learn
Data School
What is pandas? (Introduction to the Q&A series)
Data School
How do I read a tabular data file into pandas?
Data School
How do I select a pandas Series from a DataFrame?
Data School
Why do some pandas commands end with parentheses (and others don't)?
Data School
How do I rename columns in a pandas DataFrame?
Data School
How do I remove columns from a pandas DataFrame?
Data School
How do I sort a pandas DataFrame or a Series?
Data School
How do I filter rows of a pandas DataFrame by column value?
Data School
How do I apply multiple filter criteria to a pandas DataFrame?
Data School
Your pandas questions answered!
Data School
How do I use the "axis" parameter in pandas?
Data School
How do I use string methods in pandas?
Data School
How do I change the data type of a pandas Series?
Data School
When should I use a "groupby" in pandas?
Data School
How do I explore a pandas Series?
Data School
How do I handle missing values in pandas?
Data School
What do I need to know about the pandas index? (Part 1)
Data School
What do I need to know about the pandas index? (Part 2)
Data School
How do I select multiple rows and columns from a pandas DataFrame?
Data School
Machine Learning with Text in scikit-learn (PyCon 2016)
Data School
When should I use the "inplace" parameter in pandas?
Data School
How do I make my pandas DataFrame smaller and faster?
Data School
How do I use pandas with scikit-learn to create Kaggle submissions?
Data School
More of your pandas questions answered!
Data School
How do I create dummy variables in pandas?
Data School
How do I work with dates and times in pandas?
Data School
How do I find and remove duplicate rows in pandas?
Data School
How do I avoid a SettingWithCopyWarning in pandas?
Data School
How do I change display options in pandas?
Data School
How do I create a pandas DataFrame from another object?
Data School
How do I apply a function to a pandas Series or DataFrame?
Data School
Getting started with machine learning in Python (webcast)
Data School
Q&A about Machine Learning with Text (online course)
Data School
Your pandas questions answered! (webcast)
Data School
Machine Learning with Text in scikit-learn (PyData DC 2016)
Data School
Write Pythonic Code for Better Data Science (webcast)
Data School
Web scraping in Python (Part 1): Getting started
Data School
Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Data School
Web scraping in Python (Part 3): Building a dataset
Data School
More on: RAG Basics
View skill →Related Reads
📰
📰
📰
📰
What Is RAG (Retrieval-Augmented Generation)? Explained Like You’re Five
Medium · AI
Assemble Each RAG Generation Prompt from a Base Prompt Plus the Rules Each Question Needs
Towards Data Science
Enterprise Knowledge Graph: A CTO’s Playbook for AI That Understands Your Business
Medium · RAG
n8n Question and Answer Chain Node: Build Retrieval-Augmented Workflows with Any Document [Free Workflow JSON]
Dev.to · Pirate Prentice
🎓
Tutor Explanation
DeepCamp AI