Spreadsheets Tutorial: Standardizing data

DataCamp · Beginner ·🔢 Mathematical Foundations ·6y ago

Skills: Data Literacy90%

Key Takeaways

This video tutorial by DataCamp covers standardizing data in spreadsheets, including calculating z-scores and using the standardized formula to compare variables on different scales.

Full Transcript

in this lesson you'll learn how to standardize your data why do this many real-world data sets you'll encounter will often have variables that are measured on different scales for example height might be measured in feet while weight might be measured in pounds this poses a problem because variables on different scales are harder to compare and it may lead you to misinterpret the importance of a particular column that column may appear more important simply because it has larger values than another when in reality it may actually have a very similar distribution to the column with smaller values the solution to this problem is to standardize your data so that all your variables are on the same scale in statistics standardization centers of data sets distribution around the mean of the data and calculates the number of standard deviations away from the mean each point is you can standardize your data by calculating z-scores also known as standard scores z-scores are an extension of what you've already seen in this chapter to calculate the z-score of a data point subtract the mean and divide by the standard deviation as shown in this simple example here in which we have three data points first we need to calculate the mean using the average function and then the standard deviation using STD EVP let's add this information into two new columns to calculate the z-score we then need to subtract the mean from each data point and divide by the standard deviation but you probably don't want to calculate this manually as we're doing here just as with standard deviation variance mean median and other statistics you've seen so far there's a spreadsheets formula that makes it easy to calculate Z scores in the standardized formula you need to pass in the data point the mean and the standard deviation as shown here let's say we had another set of data points that are 10 times larger as you can see here while the standard deviation and mean are different ten times larger the z-scores are exactly the same despite being ten times larger the distance of each point to their respective samples mean and standard deviation are the same as in the first column and this allows you to easily compare the two columns in the exercises you'll have the opportunity to practice standardizing your data almost done with chapter one your a stats rock star

Original Description

Want to learn more? Take the full course at https://learn.datacamp.com/courses/introduction-to-statistics-in-spreadsheets at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work. --- In this lesson, you'll learn how to standardize your data. Why do this? Many real-world datasets you'll encounter will often have variables that are measured on different scales. For example, height might be measured in feet, while weight might be measured in pounds. This poses a problem, because variables on different scales are harder to compare, and it may lead you to misinterpret the importance of a particular column - that column may appear more important simply because it has larger values than another, when in reality, it may actually have a very similar distribution to the column with smaller values. The solution to this problem is to standardize your data so that all your variables are on the same scale. In statistics, standardization centers a dataset's distribution around the mean of the data and calculates the number of standard deviations away from the mean each point is. You can standardize your data by calculating z-scores, also known as standard scores. Z-scores are an extension of what you've already seen in this chapter. To calculate the z-score of a data point, subtract the mean and divide by the standard deviation, as shown on this simple example here, in which we have 3 data points. First, we need to calculate the mean, using the AVERAGE formula, and then the standard deviation, using STDEVP. Let's add this information into 2 new columns. To calculate the z-score, we then need to subtract the mean from each data point, and divide by the standard deviation. But you probably don't want to calculate this manually, as we're doing here. Just as with the standard deviation, variance, mean, median, and other statistics you've seen so far, there's a spreadsheets formula that makes it easy to calculate z-scores.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DataCamp · DataCamp · 0 of 60

← Previous Next →

SQL Server Tutorial: Date manipulation

SQL Server Tutorial: Date manipulation

R Tutorial: Intermediate Interactive Data Visualization with plotly in R

R Tutorial: Intermediate Interactive Data Visualization with plotly in R

R Tutorial: Adding aesthetics to represent a variable

R Tutorial: Adding aesthetics to represent a variable

R Tutorial: Moving Beyond Simple Interactivity

R Tutorial: Moving Beyond Simple Interactivity

Python Tutorial: Why use ML for marketing? Strategies and use cases

Python Tutorial: Why use ML for marketing? Strategies and use cases

Python Tutorial: Preparation for modeling

Python Tutorial: Preparation for modeling

Python Tutorial: Machine Learning modeling steps

Python Tutorial: Machine Learning modeling steps

R Tutorial: The prior model

R Tutorial: The prior model

R Tutorial: Data & the likelihood

R Tutorial: Data & the likelihood

R Tutorial: The posterior model

R Tutorial: The posterior model

R Tutorial: An Introduction to plotly

R Tutorial: An Introduction to plotly

R Tutorial: Plotting a single variable

R Tutorial: Plotting a single variable

R Tutorial: Bivariate graphics

R Tutorial: Bivariate graphics

Python Tutorial: Customer Segmentation in Python

Python Tutorial: Customer Segmentation in Python

Python Tutorial: Time cohorts

Python Tutorial: Time cohorts

Python Tutorial: Calculate cohort metrics

Python Tutorial: Calculate cohort metrics

Python Tutorial: Cohort analysis visualization

Python Tutorial: Cohort analysis visualization

R Tutorial: Building Dashboards with flexdashboard

R Tutorial: Building Dashboards with flexdashboard

R Tutorial: Anatomy of a flexdashboard

R Tutorial: Anatomy of a flexdashboard

R Tutorial: Layout basics

R Tutorial: Layout basics

R Tutorial: Advanced layouts

R Tutorial: Advanced layouts

Python Tutorial: Time Series Analysis in Python

Python Tutorial: Time Series Analysis in Python

Python Tutorial: Correlation of Two Time Series

Python Tutorial: Correlation of Two Time Series

Python Tutorial: Simple Linear Regressions

Python Tutorial: Simple Linear Regressions

Python Tutorial: Autocorrelation

Python Tutorial: Autocorrelation

R Tutorial: The gapminder dataset

R Tutorial: The gapminder dataset

R Tutorial: The filter verb

R Tutorial: The filter verb

R Tutorial: The arrange verb

R Tutorial: The arrange verb

R Tutorial: The mutate verb

R Tutorial: The mutate verb

R Tutorial: What is cluster analysis?

R Tutorial: What is cluster analysis?

R Tutorial: Distance between two observations

R Tutorial: Distance between two observations

R Tutorial: The importance of scale

R Tutorial: The importance of scale

R Tutorial: Measuring distance for categorical data

R Tutorial: Measuring distance for categorical data

Python Tutorial: Plotting multiple graphs

Python Tutorial: Plotting multiple graphs

Python Tutorial: Customizing axes

Python Tutorial: Customizing axes

Python Tutorial: Legends, annotations, & styles

Python Tutorial: Legends, annotations, & styles

Python Tutorial: Introduction to iterators

Python Tutorial: Introduction to iterators

Python Tutorial: Playing with iterators

Python Tutorial: Playing with iterators

Python Tutorial: Using iterators to load large files into memory

Python Tutorial: Using iterators to load large files into memory

SQL Tutorial: Introduction to Relational Databases in SQL

SQL Tutorial: Introduction to Relational Databases in SQL

SQL Tutorial: Tables: At the core of every database

SQL Tutorial: Tables: At the core of every database

SQL Tutorial: Update your database as the structure changes

SQL Tutorial: Update your database as the structure changes

Python Tutorial: Classification-Tree Learning

Python Tutorial: Classification-Tree Learning

Python Tutorial: Decision-Tree for Classification

Python Tutorial: Decision-Tree for Classification

Python Tutorial: Decision-Tree for Regression

Python Tutorial: Decision-Tree for Regression

Python Tutorial: Census Subject Tables

Python Tutorial: Census Subject Tables

Python Tutorial: Census Geography

Python Tutorial: Census Geography

Python Tutorial: Using the Census API

Python Tutorial: Using the Census API

R Tutorial: A/B Testing in R

R Tutorial: A/B Testing in R

R Tutorial: Baseline Conversion Rates

R Tutorial: Baseline Conversion Rates

R Tutorial: Designing an Experiment - Power Analysis

R Tutorial: Designing an Experiment - Power Analysis

R Tutorial: Introduction to qualitative data

R Tutorial: Introduction to qualitative data

R Tutorial: Understanding your qualitative variables

R Tutorial: Understanding your qualitative variables

R Tutorial: Making Better Plots

R Tutorial: Making Better Plots

SQL Tutorial: OLTP and OLAP

SQL Tutorial: OLTP and OLAP

SQL Tutorial: Storing data

SQL Tutorial: Storing data

SQL Tutorial: Database design

SQL Tutorial: Database design

Python Tutorial: Introduction to spaCy

Python Tutorial: Introduction to spaCy

Python Tutorial: Statistical Models

Python Tutorial: Statistical Models

Python Tutorial: Rule-based Matching

Python Tutorial: Rule-based Matching

This video tutorial teaches how to standardize data in spreadsheets by calculating z-scores and using the standardized formula, allowing for easy comparison of variables on different scales. The tutorial provides hands-on practice with exercises and covers key concepts in data analysis and statistics. By watching this video, viewers will learn how to standardize their data and make informed decisions based on their analysis.

Key Takeaways

Calculate the mean of a dataset using the average function
Calculate the standard deviation of a dataset using the STD EVP function
Calculate z-scores by subtracting the mean and dividing by the standard deviation
Use the standardized formula to calculate z-scores
Compare variables on different scales using z-scores

💡 Standardizing data by calculating z-scores allows for easy comparison of variables on different scales, preventing misinterpretation of importance due to differing scales.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Data Literacy

View skill →

Analyzing Billing Data with BigQuery

PySpark in Action: Hands-On Data Processing

PySpark in Action: Hands-On Data Processing

Analyze and Visualize Data Using Splunk Statistics

Analyze and Visualize Data Using Splunk Statistics

Apply SCD2 to Build Dynamic Data Models

Automate Financial Insights with AI Tools & Dashboards

Automate Financial Insights with AI Tools & Dashboards

Automate Excel Data with Power Query and Lookups

Automate Excel Data with Power Query and Lookups

Related AI Lessons

Super Mario is mathier than you think

Super Mario's world is full of mathematical concepts, making it a great example of how math is used in real-world problem-solving

MIT Technology Review

A Geometry Puzzle With 3 Circles

Solve a geometry puzzle involving 3 circles using mathematical reasoning and visualization techniques

Medium · Data Science

The Consecutive Integers Divisibility Trick

Learn the Consecutive Integers Divisibility Trick to simplify difficult proofs in mathematics and programming

Medium · Programming

The Mayans Invented Zero Before Most of the World — Here Is Their Number System in Python

Learn about the Mayan number system and its implementation in Python, highlighting the importance of zero in their base-20 system

Medium · Python

How to Open OSM Files (OpenStreetMap Data)

File Extension Geeks