How do I select a pandas Series from a DataFrame?

Data School · Intermediate ·🔍 RAG & Vector Search ·10y ago

Key Takeaways

This video demonstrates how to select a pandas Series from a DataFrame using bracket notation and dot notation, with a focus on handling column names with spaces or conflicts with built-in methods.

Full Transcript

hello everyone and welcome back to my Q&A video series on the pandas library in Python and the question for today is how do I select a pandas series from a data frame so there are two basic object types in pandas that hold data and one is called the data frame and it's basically just a table of rows and columns and then each of those columns is known as a pandas series okay so you can have a panda series that's not part of a data frame but in my work I find that mostly I'm working with series that are part of a data frame so the question is how do I select a series from it perhaps I want to do an analysis on a particular series or perhaps I want to manipulate that series that's why you might want to do it so let's go ahead and create an example data frame and then answer this question so first we're going to as always import pandas as PD and then on the data set we're going to use today is a data set of UFO reports reports of UFO sightings from 1930 through 2000 okay so I'm going to say UFO equals PD dot and there's two options here we're going to start with read table okay we're going to pass it the URL where the data file is stored and as always I'm using a bitly URL so that you can follow along so bitly slash UFO reports okay and then I'll say SEP comma now the file at that location is a CSV file a comma separated value file read table assumes tab separated files by default so we're going to say this up is calm now I could do that and that would work run it and it will work but the shortcut is actually to use read CSV and the literally the only difference between read table and read CSV is that read CSV uses the comma as the default separator so let's run that okay so we want to confirm this is a data frame so we'll just say type UFO and it is indeed a data frame type is just a built-in Python method for looking at the type of object and if we do dot head we can see the first five rows okay so it's the city the state the color and shape reported and the time of the UFO sighting okay so to actually answer our question which is how do we select a series let's go ahead and do that and the basic way to do this and we'll see more is to use what's known as bracket notation so this is kind of like selecting well I'll just show you first UFO bracket city and that will select out the city series okay and so this is just city this is the first 30 rows and the last 30 rows of that series okay so um that's the that's the basic method and let's just confirm this is a series we'll just say type and we see it is indeed a series okay so couple notes about this seems pretty simple but a couple notes that might be useful to remember so um this bracket notation this is case sensitive so if I just put on a lower case city that'll generate an error it'll generate a key error okay um there is a shortcut to this that is very common and useful to remember okay so um you can actually instead use what's known as dot notte chin and it looks like this UFO dot city and that does work now pandas is using a little trick here in which um every time a series is added to a data frame okay it all its name automatically becomes an attribute of that data frame okay so if I say UFO dot and I hit tab you can see right there is city along with all of these other methods and attributes I just hit the tab character after anything dot it literally makes the the column name the series name one of the attributes which is very handy now why would you want to do this well just because it's less typing to say UFO dot city then this much longer you know brackets quotation marks city so UFO dot city is quite quite handy okay so a question that might have come up in your mind is how would I select the series colors reported well would I do this that doesn't seem quite right because there's a space and house what's Python going to think that will air you could say well maybe I just delete the space and no that doesn't work either you can try lots of different variations but if the column name is colors space reported dot notation simply does not work okay if you want to select out colors reported you have to use bracket notation okay alright so I know that might look strange these are not a number we'll talk about that in a future video but um you have to use bracket notation to select a series if the series has a space in it this is also the case if you name a column something that conflicts with a built-in method or attribute okay so this is a tricky one ah let's pretend you had a column named shape okay lowercase H a sha pe if you tried to use dot notation to select that column it would fail because shape is already a built-in attribute of a data frame so um that would conflict that would not work you would just get back you know you would get back the shape of the data frame you would not get back the pandas series named shape if there was one the same thing would hold if you named a column head and then tried to access it using dot notation you would also not get what you expected okay so the bottom line here is that dot notation only works under certain circumstances whereas bracket notation will always work so if you like bracket notation just use it just know that there are people out there like me who like to use dot notation because it's faster to type so you want to recognize it um but it will not dock notation will not always work for selecting a pandas series okay so I'll always end with a bonus tip and the bonus tip for today is how do I create a new pandas series in a data frame so quick diversion on that um so in a regular Python if you have two strings like a B I'm sorry a B which is a string and C D which is a string and you use the plus sign that combines them that concatenates them okay you can do the same thing with pandas series that are strings okay you could do the same thing with pandas series that are numbers and it would do them but we've all of our series here are strings um so if we said UFO dot city + UFO dot state I think what's going to happen it will actually just put the city next to the state so you can see ethica and new york Willingboro New Jersey Holyoke Colorado okay now that looks kind of funny um maybe we want to actually put in a like a comma space if you're familiar with that American Way maybe worldwide way of listing out cities and states or city and country and now we get the city comma space state okay now perhaps we're doing this because we want to create a new column that includes both of those you might think well maybe I just assign it so UFO dot and we'll call it location equals okay this is what you might think this is not the right way this is not going to do what you think okay you have to use bracket notation when creating a new series in a data frame so that will work and you will see info dot head you will see now we've got this location column that is the city and the state okay so if you're going to create a new column in a data frame you should you have to use bracket notation when naming that column you can use either notation on the right side of the equal sign but when assigning it you have to use bracket notation I could not have said UFO dot location here okay so that's it thanks so much for joining me for this video feel free to click Subscribe if you'd like to see more videos like this as always put a comment below if you have a question about the video or a question about pandas in general maybe I'll make a video about it or I will at least answer your question in the comments okay thanks again and I will see you soon

Original Description

DataFrames and Series are the two main object types in pandas for data storage: a DataFrame is like a table, and each column of the table is called a Series. You will often select a Series in order to analyze or manipulate it. In this video, I'll show you how to select a Series using "bracket notation" and "dot notation", and will discuss the limitations of dot notation. I'll also demonstrate how to create a new Series in a DataFrame. SUBSCRIBE to learn data science with Python: https://www.youtube.com/dataschool?sub_confirmation=1 JOIN the "Data School Insiders" community and receive exclusive rewards: https://www.patreon.com/dataschool == RESOURCES == GitHub repository for the series: https://github.com/justmarkham/pandas-videos == LET'S CONNECT! == Newsletter: https://www.dataschool.io/subscribe/ Twitter: https://twitter.com/justmarkham Facebook: https://www.facebook.com/DataScienceSchool/ LinkedIn: https://www.linkedin.com/in/justmarkham/
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Data School · Data School · 24 of 60

1 Setting up Git and GitHub
Setting up Git and GitHub
Data School
2 Navigating a GitHub Repository - Part 1
Navigating a GitHub Repository - Part 1
Data School
3 Forking a GitHub Repository
Forking a GitHub Repository
Data School
4 Creating a New GitHub Repository
Creating a New GitHub Repository
Data School
5 Copying a GitHub Repository to Your Local Computer
Copying a GitHub Repository to Your Local Computer
Data School
6 Committing Changes in Git and Pushing to a GitHub Repository
Committing Changes in Git and Pushing to a GitHub Repository
Data School
7 Syncing Your GitHub Fork
Syncing Your GitHub Fork
Data School
8 Allstate Purchase Prediction Challenge on Kaggle
Allstate Purchase Prediction Challenge on Kaggle
Data School
9 Troubleshooting: Updates Rejected When Pushing to GitHub
Troubleshooting: Updates Rejected When Pushing to GitHub
Data School
10 Hands-on dplyr tutorial for faster data manipulation in R
Hands-on dplyr tutorial for faster data manipulation in R
Data School
11 ROC Curves and Area Under the Curve (AUC) Explained
ROC Curves and Area Under the Curve (AUC) Explained
Data School
12 Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Going deeper with dplyr: New features in 0.3 and 0.4 (tutorial)
Data School
13 What is machine learning, and how does it work?
What is machine learning, and how does it work?
Data School
14 Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Setting up Python for machine learning: scikit-learn and Jupyter Notebook
Data School
15 Getting started in scikit-learn with the famous iris dataset
Getting started in scikit-learn with the famous iris dataset
Data School
16 Training a machine learning model with scikit-learn
Training a machine learning model with scikit-learn
Data School
17 Comparing machine learning models in scikit-learn
Comparing machine learning models in scikit-learn
Data School
18 Data science in Python: pandas, seaborn, scikit-learn
Data science in Python: pandas, seaborn, scikit-learn
Data School
19 Selecting the best model in scikit-learn using cross-validation
Selecting the best model in scikit-learn using cross-validation
Data School
20 How to find the best model parameters in scikit-learn
How to find the best model parameters in scikit-learn
Data School
21 How to evaluate a classifier in scikit-learn
How to evaluate a classifier in scikit-learn
Data School
22 What is pandas? (Introduction to the Q&A series)
What is pandas? (Introduction to the Q&A series)
Data School
23 How do I read a tabular data file into pandas?
How do I read a tabular data file into pandas?
Data School
How do I select a pandas Series from a DataFrame?
How do I select a pandas Series from a DataFrame?
Data School
25 Why do some pandas commands end with parentheses (and others don't)?
Why do some pandas commands end with parentheses (and others don't)?
Data School
26 How do I rename columns in a pandas DataFrame?
How do I rename columns in a pandas DataFrame?
Data School
27 How do I remove columns from a pandas DataFrame?
How do I remove columns from a pandas DataFrame?
Data School
28 How do I sort a pandas DataFrame or a Series?
How do I sort a pandas DataFrame or a Series?
Data School
29 How do I filter rows of a pandas DataFrame by column value?
How do I filter rows of a pandas DataFrame by column value?
Data School
30 How do I apply multiple filter criteria to a pandas DataFrame?
How do I apply multiple filter criteria to a pandas DataFrame?
Data School
31 Your pandas questions answered!
Your pandas questions answered!
Data School
32 How do I use the "axis" parameter in pandas?
How do I use the "axis" parameter in pandas?
Data School
33 How do I use string methods in pandas?
How do I use string methods in pandas?
Data School
34 How do I change the data type of a pandas Series?
How do I change the data type of a pandas Series?
Data School
35 When should I use a "groupby" in pandas?
When should I use a "groupby" in pandas?
Data School
36 How do I explore a pandas Series?
How do I explore a pandas Series?
Data School
37 How do I handle missing values in pandas?
How do I handle missing values in pandas?
Data School
38 What do I need to know about the pandas index? (Part 1)
What do I need to know about the pandas index? (Part 1)
Data School
39 What do I need to know about the pandas index? (Part 2)
What do I need to know about the pandas index? (Part 2)
Data School
40 How do I select multiple rows and columns from a pandas DataFrame?
How do I select multiple rows and columns from a pandas DataFrame?
Data School
41 Machine Learning with Text in scikit-learn (PyCon 2016)
Machine Learning with Text in scikit-learn (PyCon 2016)
Data School
42 When should I use the "inplace" parameter in pandas?
When should I use the "inplace" parameter in pandas?
Data School
43 How do I make my pandas DataFrame smaller and faster?
How do I make my pandas DataFrame smaller and faster?
Data School
44 How do I use pandas with scikit-learn to create Kaggle submissions?
How do I use pandas with scikit-learn to create Kaggle submissions?
Data School
45 More of your pandas questions answered!
More of your pandas questions answered!
Data School
46 How do I create dummy variables in pandas?
How do I create dummy variables in pandas?
Data School
47 How do I work with dates and times in pandas?
How do I work with dates and times in pandas?
Data School
48 How do I find and remove duplicate rows in pandas?
How do I find and remove duplicate rows in pandas?
Data School
49 How do I avoid a SettingWithCopyWarning in pandas?
How do I avoid a SettingWithCopyWarning in pandas?
Data School
50 How do I change display options in pandas?
How do I change display options in pandas?
Data School
51 How do I create a pandas DataFrame from another object?
How do I create a pandas DataFrame from another object?
Data School
52 How do I apply a function to a pandas Series or DataFrame?
How do I apply a function to a pandas Series or DataFrame?
Data School
53 Getting started with machine learning in Python (webcast)
Getting started with machine learning in Python (webcast)
Data School
54 Q&A about Machine Learning with Text (online course)
Q&A about Machine Learning with Text (online course)
Data School
55 Your pandas questions answered! (webcast)
Your pandas questions answered! (webcast)
Data School
56 Machine Learning with Text in scikit-learn (PyData DC 2016)
Machine Learning with Text in scikit-learn (PyData DC 2016)
Data School
57 Write Pythonic Code for Better Data Science (webcast)
Write Pythonic Code for Better Data Science (webcast)
Data School
58 Web scraping in Python (Part 1): Getting started
Web scraping in Python (Part 1): Getting started
Data School
59 Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Web scraping in Python (Part 2): Parsing HTML with Beautiful Soup
Data School
60 Web scraping in Python (Part 3): Building a dataset
Web scraping in Python (Part 3): Building a dataset
Data School

This video teaches how to select a pandas Series from a DataFrame using bracket notation and dot notation, and how to create a new Series in a DataFrame. It covers the basics of pandas data manipulation and provides tips for handling column names with spaces or conflicts.

Key Takeaways
  1. Import pandas library
  2. Create a sample DataFrame
  3. Select a Series using bracket notation
  4. Select a Series using dot notation
  5. Handle column names with spaces or conflicts
  6. Create a new Series in a DataFrame using bracket notation
💡 Bracket notation is a more flexible and reliable way to select a Series from a DataFrame, especially when dealing with column names that have spaces or conflicts with built-in methods.

Related Reads

📰
What Is RAG (Retrieval-Augmented Generation)? Explained Like You’re Five
Learn the basics of RAG (Retrieval-Augmented Generation) in a simple and intuitive way
Medium · AI
📰
Assemble Each RAG Generation Prompt from a Base Prompt Plus the Rules Each Question Needs
Learn to assemble RAG generation prompts by combining a base prompt with rules for each question, enabling efficient LLM calls
Towards Data Science
📰
Enterprise Knowledge Graph: A CTO’s Playbook for AI That Understands Your Business
Learn how to implement an Enterprise Knowledge Graph to enable AI that understands your business, a crucial step for CTOs to leverage AI for business success
Medium · RAG
📰
n8n Question and Answer Chain Node: Build Retrieval-Augmented Workflows with Any Document [Free Workflow JSON]
Learn to build retrieval-augmented workflows with n8n's Question and Answer Chain node and any document using a free workflow JSON
Dev.to · Pirate Prentice
Up next
This FREE Tool Turns ANY PDF into Perfect Markdown (MinerU Live Test)
Prompt Engineer
Watch →