R Tutorial: Know your data

DataCamp · Beginner ·🛠️ AI Tools & Apps ·6y ago

Skills: Data Literacy90%ML Maths Basics60%

Key Takeaways

The video tutorial demonstrates how to use R to explore and understand a dataset, specifically the Baker's data from the Great British Bake Off, using functions such as glimpse from the dplyr package and skim from the skimr package.

Full Transcript

now that we've read our data into our let's start with getting to know it a little better one of the most important things you can do when working with any new data is to learn about how it was collected and do an exploratory data analysis we have been working with the Baker's data from the Great British Bake Off in each episode of the show one Baker is eliminated one wins the technical challenge and one is chosen as star Baker the title of star Baker is based on the Baker's performance across three timed challenges the signature the technical and the showstopper now let's have another look at the Baker's data so far we've printed tables to view them but if you have lots of columns most will be cut off when you print here when we print our Baker's data with ten columns we see that there are four more variables that are hidden to see all the columns we use the function glimpse from the deep hire package the argument for glimpse is the name of your table the glimpse output is a transposed view of your data where each variable appears in rows from top to bottom instead of left to right going across each row glimpse prints the first few observed values for every variable we also see the number of observations and variables at the top you may also want to summarize your data by looking at summary statistics for each variable a quick way to do this is with the skim function from the skim or package like glimpse the argument for skim is the name of your table skim provides statistics for every column depending on the type of variable the results are printed horizontally with one roper variable divided in sections for each variable type let's break down the first section of output summarizing our three character variables for Baker there are no missing values and ten complete observations for each variable the minimum and maximum values refer to string lengths also each value is unique here there are no Baker's with the same name the next sections of the skim output summarize dates the variable last underscored 8 underscore UK is the last date that each Baker appeared on the show in the UK from the men and max values we can tell that our data spans about two years for this series factor there are only three unique values across the 10 observations looking at the top counts series four is the most common the logical column named aired underscore us is true if that Baker appeared in a series that aired in the US and false if not the mean tells us that 70% of the Baker's here were seen by us viewers numeric variables are summarised last in addition to the number of missing and complete values skim returns the means standard deviations and quantiles of the variables a mini histogram is also printed to give you a sense for the distribution of each variable from this skimmed output we know that the average age of these Baker's is 34 and Baker's appeared in anywhere from 1 to 10 episodes with a median of 5 only one of these Baker's was crowned star Baker in their time on the show and they want it twice most Baker's in this table never won the technical challenge but one did win three technical challenges now it's time to put glimpse and skin into practice with our bake-off data

Original Description

Want to learn more? Take the full course at https://learn.datacamp.com/courses/working-with-data-in-the-tidyverse at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work. --- Now that we have read our data into R, let's get to know it a little better. One of the most important things you can do when working with any new data is to learn about how it was collected. We have been working with the bakers data from The Great British Bake Off. On each episode of the show, one baker is eliminated, one wins the technical challenge, and one is chosen as star baker. The title of star baker is based on the baker's performance across three timed challenges; the signature, the technical, and the showstopper. Now, let's have another look at the bakers data. So far, we've printed tibbles to view them. But, if you have lots of columns, most will be cut off when you print. Here, when we print our bakers data with 10 columns, we see that there are 4 more variables that are hidden. To see all the columns, we use the function glimpse from the dplyr package. The argument for glimpse is the name of your tibble. The glimpse output is a transposed view of your data, where each variable appears in rows from top to bottom instead of left to right. Going across each row, glimpse prints the first few observed values for every variable. We also see the number of observations and variables at the top. You may also want to summarize your data by looking at summary statistics for each variable. A quick way to do this is with the skim function from the skimr package. Like glimpse, the argument for skim is the name of your tibble. Skim provides statistics for every column depending on the type of variable. The results are printed horizontally with one row per variable, divided in sections for each variable type. Let's break down the first section of output summarizing our three character variables. For baker, there are no missing va

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DataCamp · DataCamp · 0 of 60

← Previous Next →

SQL Server Tutorial: Date manipulation

SQL Server Tutorial: Date manipulation

R Tutorial: Intermediate Interactive Data Visualization with plotly in R

R Tutorial: Intermediate Interactive Data Visualization with plotly in R

R Tutorial: Adding aesthetics to represent a variable

R Tutorial: Adding aesthetics to represent a variable

R Tutorial: Moving Beyond Simple Interactivity

R Tutorial: Moving Beyond Simple Interactivity

Python Tutorial: Why use ML for marketing? Strategies and use cases

Python Tutorial: Why use ML for marketing? Strategies and use cases

Python Tutorial: Preparation for modeling

Python Tutorial: Preparation for modeling

Python Tutorial: Machine Learning modeling steps

Python Tutorial: Machine Learning modeling steps

R Tutorial: The prior model

R Tutorial: The prior model

R Tutorial: Data & the likelihood

R Tutorial: Data & the likelihood

R Tutorial: The posterior model

R Tutorial: The posterior model

R Tutorial: An Introduction to plotly

R Tutorial: An Introduction to plotly

R Tutorial: Plotting a single variable

R Tutorial: Plotting a single variable

R Tutorial: Bivariate graphics

R Tutorial: Bivariate graphics

Python Tutorial: Customer Segmentation in Python

Python Tutorial: Customer Segmentation in Python

Python Tutorial: Time cohorts

Python Tutorial: Time cohorts

Python Tutorial: Calculate cohort metrics

Python Tutorial: Calculate cohort metrics

Python Tutorial: Cohort analysis visualization

Python Tutorial: Cohort analysis visualization

R Tutorial: Building Dashboards with flexdashboard

R Tutorial: Building Dashboards with flexdashboard

R Tutorial: Anatomy of a flexdashboard

R Tutorial: Anatomy of a flexdashboard

R Tutorial: Layout basics

R Tutorial: Layout basics

R Tutorial: Advanced layouts

R Tutorial: Advanced layouts

Python Tutorial: Time Series Analysis in Python

Python Tutorial: Time Series Analysis in Python

Python Tutorial: Correlation of Two Time Series

Python Tutorial: Correlation of Two Time Series

Python Tutorial: Simple Linear Regressions

Python Tutorial: Simple Linear Regressions

Python Tutorial: Autocorrelation

Python Tutorial: Autocorrelation

R Tutorial: The gapminder dataset

R Tutorial: The gapminder dataset

R Tutorial: The filter verb

R Tutorial: The filter verb

R Tutorial: The arrange verb

R Tutorial: The arrange verb

R Tutorial: The mutate verb

R Tutorial: The mutate verb

R Tutorial: What is cluster analysis?

R Tutorial: What is cluster analysis?

R Tutorial: Distance between two observations

R Tutorial: Distance between two observations

R Tutorial: The importance of scale

R Tutorial: The importance of scale

R Tutorial: Measuring distance for categorical data

R Tutorial: Measuring distance for categorical data

Python Tutorial: Plotting multiple graphs

Python Tutorial: Plotting multiple graphs

Python Tutorial: Customizing axes

Python Tutorial: Customizing axes

Python Tutorial: Legends, annotations, & styles

Python Tutorial: Legends, annotations, & styles

Python Tutorial: Introduction to iterators

Python Tutorial: Introduction to iterators

Python Tutorial: Playing with iterators

Python Tutorial: Playing with iterators

Python Tutorial: Using iterators to load large files into memory

Python Tutorial: Using iterators to load large files into memory

SQL Tutorial: Introduction to Relational Databases in SQL

SQL Tutorial: Introduction to Relational Databases in SQL

SQL Tutorial: Tables: At the core of every database

SQL Tutorial: Tables: At the core of every database

SQL Tutorial: Update your database as the structure changes

SQL Tutorial: Update your database as the structure changes

Python Tutorial: Classification-Tree Learning

Python Tutorial: Classification-Tree Learning

Python Tutorial: Decision-Tree for Classification

Python Tutorial: Decision-Tree for Classification

Python Tutorial: Decision-Tree for Regression

Python Tutorial: Decision-Tree for Regression

Python Tutorial: Census Subject Tables

Python Tutorial: Census Subject Tables

Python Tutorial: Census Geography

Python Tutorial: Census Geography

Python Tutorial: Using the Census API

Python Tutorial: Using the Census API

R Tutorial: A/B Testing in R

R Tutorial: A/B Testing in R

R Tutorial: Baseline Conversion Rates

R Tutorial: Baseline Conversion Rates

R Tutorial: Designing an Experiment - Power Analysis

R Tutorial: Designing an Experiment - Power Analysis

R Tutorial: Introduction to qualitative data

R Tutorial: Introduction to qualitative data

R Tutorial: Understanding your qualitative variables

R Tutorial: Understanding your qualitative variables

R Tutorial: Making Better Plots

R Tutorial: Making Better Plots

SQL Tutorial: OLTP and OLAP

SQL Tutorial: OLTP and OLAP

SQL Tutorial: Storing data

SQL Tutorial: Storing data

SQL Tutorial: Database design

SQL Tutorial: Database design

Python Tutorial: Introduction to spaCy

Python Tutorial: Introduction to spaCy

Python Tutorial: Statistical Models

Python Tutorial: Statistical Models

Python Tutorial: Rule-based Matching

Python Tutorial: Rule-based Matching

This video tutorial teaches how to use R to explore and understand a dataset, including calculating summary statistics and visualizing data. The tutorial uses the Baker's data from the Great British Bake Off as an example.

Key Takeaways

Load the dplyr and skimr packages in R
Use the glimpse function to view the dataset
Use the skim function to calculate summary statistics
Explore the dataset to understand the variables and their relationships

💡 The glimpse and skim functions in R provide a quick and easy way to understand the structure and content of a dataset.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Data Literacy

View skill →

Analyzing Billing Data with BigQuery

PySpark in Action: Hands-On Data Processing

PySpark in Action: Hands-On Data Processing

Analyze and Visualize Data Using Splunk Statistics

Analyze and Visualize Data Using Splunk Statistics

Apply SCD2 to Build Dynamic Data Models

Automate Financial Insights with AI Tools & Dashboards

Automate Financial Insights with AI Tools & Dashboards

Automate Excel Data with Power Query and Lookups

Automate Excel Data with Power Query and Lookups

Related Reads

How I Built a Free Online Image & PDF Processing Platform with Vue 3 + FastAPI

Learn how to build a free online image and PDF processing platform using Vue 3 and FastAPI, and discover the benefits of combining these technologies for efficient file processing

Dev.to · IAMUU

I Built a Free AI-Powered YouTube SEO Toolkit With Zero Budget. Here’s What Actually Happened.

Learn how a solo dev built a free AI-powered YouTube SEO toolkit with zero budget and the lessons they learned from the experience

Medium · Startup

How to Create a Second Version of Yourself Inside Obsidian Using AI (Step-by-Step Guide)

Learn to create a second version of yourself inside Obsidian using AI with a step-by-step guide

Medium · ChatGPT

How to prepare for Spain civil service TIC exam using AI in 2026

Learn how to prepare for the Spain civil service TIC exam using AI in 2026, boosting your chances of success with technology-driven study techniques

Dev.to · David García

I Asked Gemini to Build a Dashboard... I Didn't Expect This