Python Tutorial: Arithmetic with Series & DataFrames

DataCamp · Beginner ·🛠️ AI Tools & Apps ·6y ago

Key Takeaways

This video tutorial covers arithmetic and mathematical operations between Pandas Series and DataFrames, including element-wise multiplication, division, and percentage calculations, as well as handling non-aligned indexes.

Full Transcript

let's explore various arithmetic and mathematical operations between pandas series and data frames we load daily weather measurements for Pittsburgh from 2013 we make date the index and we use par States equals true to get date/time objects with date/time indexes we can use convenient strings to slice say the first week of July from the precipitation in column the precipitation data are in inches let's convert them to centimeters we use the asterisk to multiply a series element-wise by 2.5 for remember we can broadcast standard scalar mathematical operations here broadcasting means the multiplication is applied to all elements in the data frame let's find the percentage variation in temperature in the first week of July that is the daily minimum and the daily maximum temperatures expressed as a percentage of the daily mean friend temperature we can compute this by dividing both the min temperature F and the max temperature F columns by the mean temperature F column and multiplying both by 100 to begin slice the min temperature F and the max temperature F columns as a data frame week one range next slice the mean temperature F column as a series week one mean dividing the data frame week one range by the series week one mean doesn't quite work the column labels don't match so the result has all null values instead we want to use the data frame divide method with the option access equals rows the divide method provides more fine-grained control than the slash operator for division itself this broadcasts the series week 1 mean across each row to produce the desired ratios we can see the temperature range varies by it most about 10% from the mean in that week a related computation is to compute a percentage change along a time series we do this by subtracting the previous day's value from the current day's value and if by the previous day's value the percent change method does precisely this computation for us here we also multiply the resulting series by 100 to yield a percentage value notice the value in the first row is NaN because there is no previous entry finally let's examine how arithmetic operations work between distinct series or data frames with non-aligned indexes which happens often in practice we'll use Olympic medal data from 1896 to 2008 here are the top five bronze medal winning countries the top five silver medal winning countries and the top five gold medal winning countries for that period all three data frames have the same indices for the first three rows United States Soviet Union and United Kingdom by contrast the next two rows are either France Germany or Italy let's compute the total medals awarded to each country we start by adding bronze and silver here we add two series of five rows and get back a series with six rows the index of the sum is the union of the row indices from the original two series arithmetic operation between pandas series are carried out four rows with common index values since Germany does not appear in silver and Italy does not appear in bronze those rows have nan in the sum on examination we see the value 2247 for the United States Row is the sum of 1052 and 1195 from the corresponding rows of the bronze and silver series respectively we can get the same sum bronze plus silver with a method invocation using bronze dot add silver the null values occur in the same places the default fill value is nan when summoned rows fail to align we can modify this behavior using the fill value option of the add method by specifying fill value equals 0 the values of Germany and Italy are no longer null just as the divide method is more flexible than the slash operator for division the add method is more flexible than the plus operator for it adding all three series together yield six rows of output but only three have non null values that is France Germany and Italy are not indexed labels in all three series so each of those rows is NaN in the sum we can also change calls to the dot add method with fill value equals zero to get rid of those null values in the triple sum now you can get some experience with standard arithmetic operations and methods for series and data frames in the exercises

Original Description

Want to learn more? Take the full course at https://learn.datacamp.com/courses/merging-dataframes-with-pandas at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work. --- Let's explore various arithmetic & mathematical operations between Pandas Series & DataFrames. We load daily weather measurements for Pittsburgh from 2013. We make 'Date' the Index & we use parse_dates=True to get datetime objects. With datetime Indexes, we can use convenient strings to slice, say, the first week of July from the 'PrecipitationIn' column. The Precipitation data are in inches; let's convert them to centimeters. We use the asterisk to multiply a Series elementwise by 2-point-54. Remember, we can broadcast standard scalar mathematical operations. Here, broadcasting means the multiplication is applied to all entries in the DataFrame. Let's find the percentage variation in temperature in the first week of July. That is, the daily minimum & the daily maximum temperatures expressed as a percentage of the daily mean temperature. We compute this by dividing both the 'Min TemperatureF' and the 'Max TemperatureF' columns by the Mean TemperatureF column and multiplying both by 100. To begin, slice the 'Min TemperatureF' and the 'Max TemperatureF' columns as a DataFrame week1_range. Next, slice the Mean TemperatureF column as a Series week1_mean. Dividing DataFrame week1_range by Series week1_mean doesn't quite work. The column labels don't match so the result has all null values. Instead, we want to use the DataFrame dot divide() method with option axis='rows'. The dot divide() method provides more fine-grained control than the division operator by itself. This broadcasts the Series week1_mean values across each row to produce the desired ratios. We can see the temperature range varies by at most about 10% from the mean in that week. A related computation is to compute a percentage change along a time series. We do this
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DataCamp · DataCamp · 0 of 60

← Previous Next →
1 SQL Server Tutorial: Date manipulation
SQL Server Tutorial: Date manipulation
DataCamp
2 R Tutorial: Intermediate Interactive Data Visualization with plotly in R
R Tutorial: Intermediate Interactive Data Visualization with plotly in R
DataCamp
3 R Tutorial: Adding aesthetics to represent a variable
R Tutorial: Adding aesthetics to represent a variable
DataCamp
4 R Tutorial: Moving Beyond Simple Interactivity
R Tutorial: Moving Beyond Simple Interactivity
DataCamp
5 Python Tutorial: Why use ML for marketing? Strategies and use cases
Python Tutorial: Why use ML for marketing? Strategies and use cases
DataCamp
6 Python Tutorial: Preparation for modeling
Python Tutorial: Preparation for modeling
DataCamp
7 Python Tutorial: Machine Learning modeling steps
Python Tutorial: Machine Learning modeling steps
DataCamp
8 R Tutorial: The prior model
R Tutorial: The prior model
DataCamp
9 R Tutorial: Data & the likelihood
R Tutorial: Data & the likelihood
DataCamp
10 R Tutorial: The posterior model
R Tutorial: The posterior model
DataCamp
11 R Tutorial: An Introduction to plotly
R Tutorial: An Introduction to plotly
DataCamp
12 R Tutorial: Plotting a single variable
R Tutorial: Plotting a single variable
DataCamp
13 R Tutorial: Bivariate graphics
R Tutorial: Bivariate graphics
DataCamp
14 Python Tutorial: Customer Segmentation in Python
Python Tutorial: Customer Segmentation in Python
DataCamp
15 Python Tutorial: Time cohorts
Python Tutorial: Time cohorts
DataCamp
16 Python Tutorial: Calculate cohort metrics
Python Tutorial: Calculate cohort metrics
DataCamp
17 Python Tutorial: Cohort analysis visualization
Python Tutorial: Cohort analysis visualization
DataCamp
18 R Tutorial: Building Dashboards with flexdashboard
R Tutorial: Building Dashboards with flexdashboard
DataCamp
19 R Tutorial: Anatomy of a flexdashboard
R Tutorial: Anatomy of a flexdashboard
DataCamp
20 R Tutorial: Layout basics
R Tutorial: Layout basics
DataCamp
21 R Tutorial: Advanced layouts
R Tutorial: Advanced layouts
DataCamp
22 Python Tutorial: Time Series Analysis in Python
Python Tutorial: Time Series Analysis in Python
DataCamp
23 Python Tutorial: Correlation of Two Time Series
Python Tutorial: Correlation of Two Time Series
DataCamp
24 Python Tutorial: Simple Linear Regressions
Python Tutorial: Simple Linear Regressions
DataCamp
25 Python Tutorial: Autocorrelation
Python Tutorial: Autocorrelation
DataCamp
26 R Tutorial: The gapminder dataset
R Tutorial: The gapminder dataset
DataCamp
27 R Tutorial: The filter verb
R Tutorial: The filter verb
DataCamp
28 R Tutorial: The arrange verb
R Tutorial: The arrange verb
DataCamp
29 R Tutorial: The mutate verb
R Tutorial: The mutate verb
DataCamp
30 R Tutorial: What is cluster analysis?
R Tutorial: What is cluster analysis?
DataCamp
31 R Tutorial: Distance between two observations
R Tutorial: Distance between two observations
DataCamp
32 R Tutorial: The importance of scale
R Tutorial: The importance of scale
DataCamp
33 R Tutorial: Measuring distance for categorical data
R Tutorial: Measuring distance for categorical data
DataCamp
34 Python Tutorial: Plotting multiple graphs
Python Tutorial: Plotting multiple graphs
DataCamp
35 Python Tutorial: Customizing axes
Python Tutorial: Customizing axes
DataCamp
36 Python Tutorial: Legends, annotations, & styles
Python Tutorial: Legends, annotations, & styles
DataCamp
37 Python Tutorial: Introduction to iterators
Python Tutorial: Introduction to iterators
DataCamp
38 Python Tutorial: Playing with iterators
Python Tutorial: Playing with iterators
DataCamp
39 Python Tutorial: Using iterators to load large files into memory
Python Tutorial: Using iterators to load large files into memory
DataCamp
40 SQL Tutorial: Introduction to Relational Databases in SQL
SQL Tutorial: Introduction to Relational Databases in SQL
DataCamp
41 SQL Tutorial: Tables: At the core of every database
SQL Tutorial: Tables: At the core of every database
DataCamp
42 SQL Tutorial: Update your database as the structure changes
SQL Tutorial: Update your database as the structure changes
DataCamp
43 Python Tutorial: Classification-Tree Learning
Python Tutorial: Classification-Tree Learning
DataCamp
44 Python Tutorial: Decision-Tree for Classification
Python Tutorial: Decision-Tree for Classification
DataCamp
45 Python Tutorial: Decision-Tree for Regression
Python Tutorial: Decision-Tree for Regression
DataCamp
46 Python Tutorial: Census Subject Tables
Python Tutorial: Census Subject Tables
DataCamp
47 Python Tutorial: Census Geography
Python Tutorial: Census Geography
DataCamp
48 Python Tutorial: Using the Census API
Python Tutorial: Using the Census API
DataCamp
49 R Tutorial: A/B Testing in R
R Tutorial: A/B Testing in R
DataCamp
50 R Tutorial: Baseline Conversion Rates
R Tutorial: Baseline Conversion Rates
DataCamp
51 R Tutorial: Designing an Experiment - Power Analysis
R Tutorial: Designing an Experiment - Power Analysis
DataCamp
52 R Tutorial: Introduction to qualitative data
R Tutorial: Introduction to qualitative data
DataCamp
53 R Tutorial: Understanding your qualitative variables
R Tutorial: Understanding your qualitative variables
DataCamp
54 R Tutorial: Making Better Plots
R Tutorial: Making Better Plots
DataCamp
55 SQL Tutorial: OLTP and OLAP
SQL Tutorial: OLTP and OLAP
DataCamp
56 SQL Tutorial: Storing data
SQL Tutorial: Storing data
DataCamp
57 SQL Tutorial: Database design
SQL Tutorial: Database design
DataCamp
58 Python Tutorial: Introduction to spaCy
Python Tutorial: Introduction to spaCy
DataCamp
59 Python Tutorial: Statistical Models
Python Tutorial: Statistical Models
DataCamp
60 Python Tutorial: Rule-based Matching
Python Tutorial: Rule-based Matching
DataCamp

This video tutorial teaches how to perform arithmetic and mathematical operations between Pandas Series and DataFrames, including handling non-aligned indexes. By the end of this lesson, you will be able to perform various calculations and manipulate data using Pandas.

Key Takeaways
  1. Load daily weather measurements for Pittsburgh from 2013
  2. Convert precipitation data from inches to centimeters
  3. Calculate percentage variation in temperature
  4. Compute percentage change along a time series
  5. Add series with non-aligned indexes
💡 The divide method provides more fine-grained control than the slash operator for division, and the add method is more flexible than the plus operator for addition, especially when dealing with non-aligned indexes.

Related AI Lessons

Up next
AI in Care - Katie Furey, Pairly.com
The Access Group
Watch →