A minimalist's guide to slicing and indexing pandas DataFrames
Key Takeaways
This video tutorial by Brandon Rohrer provides a minimalist guide to slicing and indexing pandas DataFrames, covering label-based and position-based indexing using the .loc and .iloc functions.
Full Transcript
hi this is Brandon roar and this is a minimalist guide to slicing and indexing pandas dataframes there are a lot of ways to pull the elements rows and columns from a data frame if you're feeling brave sometime check out the seven part series on pandas indexing linked to below some indexing methods appear very similar but behave very differently the goal of this post is to identify a single strategy for pulling data from a data frame that's straightforward to interpret and produces reliable results and just a heads up these are my own thoughts only there's no guarantee that it's authoritative or even right now in case you wanted to skip to the end here's the bottom line one use dot lok for labels to use dot i lok for positions and three explicitly designate both rows and columns even if it's with a colon we'll step through some examples to illustrate these below is a link to the Python script if you'd like to run them yourself to start with we'll create a small data frame using data from Wikipedia on the highest mountains in the world for each mountain we have its name height in meters year when it was first summited and the range to which it belongs if this is your first exposure to a panda's data frame each mountain and its associated information is a row and each piece of information for instance name or height is a column each column has a name associated with it within pandas also known as a label the labels for our columns are named height in meters summited and mountain range in pandas dataframes each row also has a name now by default this label is just the row number counting starting at zero however you can set one of your columns to be the index of your data frame which means that its values will used as the row labels we'll set our column name as our index it's a common operation to pick out one of the data frames columns to work on to select a column by its label we use the dot look funky can do that makes our commands easy to interpret is to always include both the row index and the column index that we're interested in in this case we're interested in all of the rows so to show this we use a colon then to indicate the column that were interested in we add its label the command mountains dot Lok colon comma summited gets us just the summited column it's worth noting that this command returns a series the pandas data structure that's used to represent a column if instead of a series we just wanted an array of the numbers that are in the summited column we can add dot values to the end of this command that would return an umpire array containing 1953 1954 1955 and 1956 if we would only like to get a single row then we can use the dot location again this time specifying a row label and putting a colon in the column position if we only want a single value for instance the year that k2 is summited then we can specify the labels for both the row and the column the row always comes first well it's true that you can get away with using only one argument in the location it's most straightforward to interpret if you always specify both the row and column even if it's with a colon we don't have to limit ourselves to a single row or a single column using this method here in the row position we pass a list of labels this returns a set of rows rather than just one we can also get a subset of the columns by specifying the start and end column and putting a colon in between in this case height colon summited will give us all of the columns between and including the start point height and the end point summited note that this is different than numerical indexing in numpy where the endpoint is omitted by default also because we've already specified the name column as the index our result will also be returned the name will also be returned in the data frame that we get back in addition we can select rows or columns where a value meets a certain condition in this case we want to find the rows where the values of the summited column are greater than 1954 in the rows position we can put any boolean expression that has the same number of values as we have rows and we could do this for the columns as well if we wished as an alternative to selecting rows and columns by their labels we can alternatively select them by their row and/or column number the ordering of the columns and thus their positions depends on how the data frame is initialized the index column our name column doesn't get counted in this case to select data by its position we use the I'll oak function again the first argument is for the rows and the second argument is for the columns to select all the columns in the zeroth row for instance we write I'll oak 0 comma colon similarly we can select a column by position by putting the column number we want in the column argument of the ILOG function we can pull out a single value by specifying both the position of the row and the column we can pass a list of positions if we want to cherry-pick certain rows and/or a certain call we can also use the : range operator to get a contiguous set of rows or columns by position note that unlike the Lok using labels the I Alok function using positions does not include the endpoint in this case it returns only columns 0 and 1 and does not return column 2 all of this can be summed up as follows one use Lok for the label based indexing to use I look for position based indexing and three explicitly designate both the rows and the columns even if it's with a colon this set of guidelines will give you a consistent and straightforwardly interpretable way to pull data that you need from a panda's data frame good luck with your data munging you
Original Description
This is part of a free Data Munging course. Feel free to browse for other tips and tricks:
https://end-to-end-machine-learning.teachable.com/p/data-munging-tips-and-tricks
[blog post] http://brohrer.github.io/dataframe_indexing.html
Follow me for announcements and updates: https://twitter.com/_brohrer_
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Brandon Rohrer · Brandon Rohrer · 29 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
▶
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Robot Learning with a Biologically-Inspired Brain (BECCA)
Brandon Rohrer
BECCA talk at AGI 2011
Brandon Rohrer
Robot Learning with a Biologically-Inspired Brain (BECCA), The Sequel
Brandon Rohrer
BECCA listens to The Hobbit
Brandon Rohrer
Learning the building blocks of speech: BECCA extracts a hierarchy of audio features
Brandon Rohrer
BECCA listens for sound effects in The Hobbit
Brandon Rohrer
BECCA finds movie trailers while watching the Big Bang Theory
Brandon Rohrer
Listening for unexpected sounds: BECCA detects anomalies in audio data
Brandon Rohrer
Learning the building blocks of vision: BECCA extracts a spatio-temporal hierarchy of features
Brandon Rohrer
Watching for the unexpected: BECCA detects anomalies in video data
Brandon Rohrer
BECCA finds a stationary target
Brandon Rohrer
BECCA finds a stationary target at 3X speed
Brandon Rohrer
BECCA watches the X-men and Bruce Lee
Brandon Rohrer
BECCA plays Quidditch
Brandon Rohrer
BECCA chases a ball
Brandon Rohrer
BECCA chases a ball, part 2
Brandon Rohrer
Becca chases a ball, part 3
Brandon Rohrer
BECCA creates features from MNIST
Brandon Rohrer
How reinforcement learning works in Becca 7
Brandon Rohrer
Deep Learning Demystified
Brandon Rohrer
How Data Science Works
Brandon Rohrer
How Convolutional Neural Networks work
Brandon Rohrer
How Bayes Theorem works
Brandon Rohrer
How Deep Neural Networks Work
Brandon Rohrer
Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)
Brandon Rohrer
How Support Vector Machines work / How to open a black box
Brandon Rohrer
How autocorrelation works
Brandon Rohrer
Getting closer to human intelligence through robotics
Brandon Rohrer
A minimalist's guide to slicing and indexing pandas DataFrames
Brandon Rohrer
How decision trees work
Brandon Rohrer
Data scientist archetypes
Brandon Rohrer
How to use python's datetime package
Brandon Rohrer
How optimization for machine learning works, part 1
Brandon Rohrer
How optimization for machine learning works, part 2
Brandon Rohrer
How optimization for machine learning works, part 3
Brandon Rohrer
How optimization for machine learning works, part 4
Brandon Rohrer
How convolutional neural networks work, in depth
Brandon Rohrer
How to pick a machine learning model 4: Splitting the data
Brandon Rohrer
How to pick a machine learning model 3: Choosing a loss function
Brandon Rohrer
How to pick a machine learning model 2: Separating signal from noise
Brandon Rohrer
How to pick a machine learning model 1: Choosing between models
Brandon Rohrer
How to pick a machine learning model 5: Navigating assumptions
Brandon Rohrer
What do neural networks learn?
Brandon Rohrer
Interview with iRobot's Director of Data Science Angela Bassa
Brandon Rohrer
How Backpropagation Works
Brandon Rohrer
Evolutionary Powell's method: A discrete optimizer for hyperparameter optimization
Brandon Rohrer
1D convolution for neural networks, part 1: Sliding dot product
Brandon Rohrer
1D convolution for neural networks, part 2: Convolution copies the kernel
Brandon Rohrer
1D convolution for neural networks, part 3: Sliding dot product equations longhand
Brandon Rohrer
1D convolution for neural networks, part 4: Convolution equation
Brandon Rohrer
1D convolution for neural networks, part 5: Backpropagation
Brandon Rohrer
1D convolution for neural networks, part 6: Input gradient
Brandon Rohrer
1D convolution for neural networks, part 7: Weight gradient
Brandon Rohrer
1D convolution for neural networks, part 8: Padding
Brandon Rohrer
1D convolution for neural networks, part 9: Stride
Brandon Rohrer
The Four Grand Challenges of Robots in the Home
Brandon Rohrer
How Convolution Works
Brandon Rohrer
The Softmax neural network layer
Brandon Rohrer
Batch normalization
Brandon Rohrer
Getting ready to learn Python, Mac edition #1: Files and directories
Brandon Rohrer
More on: Data Literacy
View skill →Related AI Lessons
🎓
Tutor Explanation
DeepCamp AI