A minimalist's guide to slicing and indexing pandas DataFrames

Brandon Rohrer · Beginner ·📰 AI News & Updates ·7y ago

Key Takeaways

This video tutorial by Brandon Rohrer provides a minimalist guide to slicing and indexing pandas DataFrames, covering label-based and position-based indexing using the .loc and .iloc functions.

Full Transcript

hi this is Brandon roar and this is a minimalist guide to slicing and indexing pandas dataframes there are a lot of ways to pull the elements rows and columns from a data frame if you're feeling brave sometime check out the seven part series on pandas indexing linked to below some indexing methods appear very similar but behave very differently the goal of this post is to identify a single strategy for pulling data from a data frame that's straightforward to interpret and produces reliable results and just a heads up these are my own thoughts only there's no guarantee that it's authoritative or even right now in case you wanted to skip to the end here's the bottom line one use dot lok for labels to use dot i lok for positions and three explicitly designate both rows and columns even if it's with a colon we'll step through some examples to illustrate these below is a link to the Python script if you'd like to run them yourself to start with we'll create a small data frame using data from Wikipedia on the highest mountains in the world for each mountain we have its name height in meters year when it was first summited and the range to which it belongs if this is your first exposure to a panda's data frame each mountain and its associated information is a row and each piece of information for instance name or height is a column each column has a name associated with it within pandas also known as a label the labels for our columns are named height in meters summited and mountain range in pandas dataframes each row also has a name now by default this label is just the row number counting starting at zero however you can set one of your columns to be the index of your data frame which means that its values will used as the row labels we'll set our column name as our index it's a common operation to pick out one of the data frames columns to work on to select a column by its label we use the dot look funky can do that makes our commands easy to interpret is to always include both the row index and the column index that we're interested in in this case we're interested in all of the rows so to show this we use a colon then to indicate the column that were interested in we add its label the command mountains dot Lok colon comma summited gets us just the summited column it's worth noting that this command returns a series the pandas data structure that's used to represent a column if instead of a series we just wanted an array of the numbers that are in the summited column we can add dot values to the end of this command that would return an umpire array containing 1953 1954 1955 and 1956 if we would only like to get a single row then we can use the dot location again this time specifying a row label and putting a colon in the column position if we only want a single value for instance the year that k2 is summited then we can specify the labels for both the row and the column the row always comes first well it's true that you can get away with using only one argument in the location it's most straightforward to interpret if you always specify both the row and column even if it's with a colon we don't have to limit ourselves to a single row or a single column using this method here in the row position we pass a list of labels this returns a set of rows rather than just one we can also get a subset of the columns by specifying the start and end column and putting a colon in between in this case height colon summited will give us all of the columns between and including the start point height and the end point summited note that this is different than numerical indexing in numpy where the endpoint is omitted by default also because we've already specified the name column as the index our result will also be returned the name will also be returned in the data frame that we get back in addition we can select rows or columns where a value meets a certain condition in this case we want to find the rows where the values of the summited column are greater than 1954 in the rows position we can put any boolean expression that has the same number of values as we have rows and we could do this for the columns as well if we wished as an alternative to selecting rows and columns by their labels we can alternatively select them by their row and/or column number the ordering of the columns and thus their positions depends on how the data frame is initialized the index column our name column doesn't get counted in this case to select data by its position we use the I'll oak function again the first argument is for the rows and the second argument is for the columns to select all the columns in the zeroth row for instance we write I'll oak 0 comma colon similarly we can select a column by position by putting the column number we want in the column argument of the ILOG function we can pull out a single value by specifying both the position of the row and the column we can pass a list of positions if we want to cherry-pick certain rows and/or a certain call we can also use the : range operator to get a contiguous set of rows or columns by position note that unlike the Lok using labels the I Alok function using positions does not include the endpoint in this case it returns only columns 0 and 1 and does not return column 2 all of this can be summed up as follows one use Lok for the label based indexing to use I look for position based indexing and three explicitly designate both the rows and the columns even if it's with a colon this set of guidelines will give you a consistent and straightforwardly interpretable way to pull data that you need from a panda's data frame good luck with your data munging you

Original Description

This is part of a free Data Munging course. Feel free to browse for other tips and tricks: https://end-to-end-machine-learning.teachable.com/p/data-munging-tips-and-tricks [blog post] http://brohrer.github.io/dataframe_indexing.html Follow me for announcements and updates: https://twitter.com/_brohrer_
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Brandon Rohrer · Brandon Rohrer · 29 of 60

1 Robot Learning with a Biologically-Inspired Brain (BECCA)
Robot Learning with a Biologically-Inspired Brain (BECCA)
Brandon Rohrer
2 BECCA talk at AGI 2011
BECCA talk at AGI 2011
Brandon Rohrer
3 Robot Learning with a Biologically-Inspired Brain (BECCA), The Sequel
Robot Learning with a Biologically-Inspired Brain (BECCA), The Sequel
Brandon Rohrer
4 BECCA listens to The Hobbit
BECCA listens to The Hobbit
Brandon Rohrer
5 Learning the building blocks of speech: BECCA extracts a hierarchy of audio features
Learning the building blocks of speech: BECCA extracts a hierarchy of audio features
Brandon Rohrer
6 BECCA listens for sound effects in The Hobbit
BECCA listens for sound effects in The Hobbit
Brandon Rohrer
7 BECCA finds movie trailers while watching the Big Bang Theory
BECCA finds movie trailers while watching the Big Bang Theory
Brandon Rohrer
8 Listening for unexpected sounds: BECCA detects anomalies in audio data
Listening for unexpected sounds: BECCA detects anomalies in audio data
Brandon Rohrer
9 Learning the building blocks of vision: BECCA extracts a spatio-temporal hierarchy of features
Learning the building blocks of vision: BECCA extracts a spatio-temporal hierarchy of features
Brandon Rohrer
10 Watching for the unexpected: BECCA detects anomalies in video data
Watching for the unexpected: BECCA detects anomalies in video data
Brandon Rohrer
11 BECCA finds a stationary target
BECCA finds a stationary target
Brandon Rohrer
12 BECCA finds a stationary target at 3X speed
BECCA finds a stationary target at 3X speed
Brandon Rohrer
13 BECCA watches the X-men and Bruce Lee
BECCA watches the X-men and Bruce Lee
Brandon Rohrer
14 BECCA plays Quidditch
BECCA plays Quidditch
Brandon Rohrer
15 BECCA chases a ball
BECCA chases a ball
Brandon Rohrer
16 BECCA chases a ball, part 2
BECCA chases a ball, part 2
Brandon Rohrer
17 Becca chases a ball, part 3
Becca chases a ball, part 3
Brandon Rohrer
18 BECCA creates features from MNIST
BECCA creates features from MNIST
Brandon Rohrer
19 How reinforcement learning works in Becca 7
How reinforcement learning works in Becca 7
Brandon Rohrer
20 Deep Learning Demystified
Deep Learning Demystified
Brandon Rohrer
21 How Data Science Works
How Data Science Works
Brandon Rohrer
22 How Convolutional Neural Networks work
How Convolutional Neural Networks work
Brandon Rohrer
23 How Bayes Theorem works
How Bayes Theorem works
Brandon Rohrer
24 How Deep Neural Networks Work
How Deep Neural Networks Work
Brandon Rohrer
25 Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)
Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)
Brandon Rohrer
26 How Support Vector Machines work / How to open a black box
How Support Vector Machines work / How to open a black box
Brandon Rohrer
27 How autocorrelation works
How autocorrelation works
Brandon Rohrer
28 Getting closer to human intelligence through robotics
Getting closer to human intelligence through robotics
Brandon Rohrer
A minimalist's guide to slicing and indexing pandas DataFrames
A minimalist's guide to slicing and indexing pandas DataFrames
Brandon Rohrer
30 How decision trees work
How decision trees work
Brandon Rohrer
31 Data scientist archetypes
Data scientist archetypes
Brandon Rohrer
32 How to use python's datetime package
How to use python's datetime package
Brandon Rohrer
33 How optimization for machine learning works, part 1
How optimization for machine learning works, part 1
Brandon Rohrer
34 How optimization for machine learning works, part 2
How optimization for machine learning works, part 2
Brandon Rohrer
35 How optimization for machine learning works, part 3
How optimization for machine learning works, part 3
Brandon Rohrer
36 How optimization for machine learning works, part 4
How optimization for machine learning works, part 4
Brandon Rohrer
37 How convolutional neural networks work, in depth
How convolutional neural networks work, in depth
Brandon Rohrer
38 How to pick a machine learning model 4: Splitting the data
How to pick a machine learning model 4: Splitting the data
Brandon Rohrer
39 How to pick a machine learning model 3: Choosing a loss function
How to pick a machine learning model 3: Choosing a loss function
Brandon Rohrer
40 How to pick a machine learning model 2: Separating signal from noise
How to pick a machine learning model 2: Separating signal from noise
Brandon Rohrer
41 How to pick a machine learning model 1: Choosing between models
How to pick a machine learning model 1: Choosing between models
Brandon Rohrer
42 How to pick a machine learning model 5: Navigating assumptions
How to pick a machine learning model 5: Navigating assumptions
Brandon Rohrer
43 What do neural networks learn?
What do neural networks learn?
Brandon Rohrer
44 Interview with iRobot's Director of Data Science Angela Bassa
Interview with iRobot's Director of Data Science Angela Bassa
Brandon Rohrer
45 How Backpropagation Works
How Backpropagation Works
Brandon Rohrer
46 Evolutionary Powell's method: A discrete optimizer for hyperparameter optimization
Evolutionary Powell's method: A discrete optimizer for hyperparameter optimization
Brandon Rohrer
47 1D convolution for neural networks, part 1: Sliding dot product
1D convolution for neural networks, part 1: Sliding dot product
Brandon Rohrer
48 1D convolution for neural networks, part 2: Convolution copies the kernel
1D convolution for neural networks, part 2: Convolution copies the kernel
Brandon Rohrer
49 1D convolution for neural networks, part 3: Sliding dot product equations longhand
1D convolution for neural networks, part 3: Sliding dot product equations longhand
Brandon Rohrer
50 1D convolution for neural networks, part 4: Convolution equation
1D convolution for neural networks, part 4: Convolution equation
Brandon Rohrer
51 1D convolution for neural networks, part 5: Backpropagation
1D convolution for neural networks, part 5: Backpropagation
Brandon Rohrer
52 1D convolution for neural networks, part 6: Input gradient
1D convolution for neural networks, part 6: Input gradient
Brandon Rohrer
53 1D convolution for neural networks, part 7: Weight gradient
1D convolution for neural networks, part 7: Weight gradient
Brandon Rohrer
54 1D convolution for neural networks, part 8: Padding
1D convolution for neural networks, part 8: Padding
Brandon Rohrer
55 1D convolution for neural networks, part 9: Stride
1D convolution for neural networks, part 9: Stride
Brandon Rohrer
56 The Four Grand Challenges of Robots in the Home
The Four Grand Challenges of Robots in the Home
Brandon Rohrer
57 How Convolution Works
How Convolution Works
Brandon Rohrer
58 The Softmax neural network layer
The Softmax neural network layer
Brandon Rohrer
59 Batch normalization
Batch normalization
Brandon Rohrer
60 Getting ready to learn Python, Mac edition #1: Files and directories
Getting ready to learn Python, Mac edition #1: Files and directories
Brandon Rohrer

This video tutorial provides a guide to slicing and indexing pandas DataFrames using the .loc and .iloc functions, covering label-based and position-based indexing. The tutorial aims to provide a consistent and straightforwardly interpretable way to pull data from a pandas DataFrame.

Key Takeaways
  1. Create a sample DataFrame
  2. Use .loc for label-based indexing
  3. Use .iloc for position-based indexing
  4. Select specific rows and columns using .loc and .iloc
  5. Use boolean expressions to select rows and columns based on conditions
💡 Using .loc for label-based indexing and .iloc for position-based indexing provides a consistent and straightforwardly interpretable way to pull data from a pandas DataFrame.

Related AI Lessons

Up next
News At 10
Channels Television
Watch →