Data frames in R - Transforming data PART I

365 Data Science · Beginner ·📐 ML Fundamentals ·8y ago
Skills: ML Pipelines70%

Key Takeaways

The video covers data transformation in R using the dplyr package, specifically the filter, select, and mutate functions.

Full Transcript

hello and welcome back to our first attest Excel data science in the next few lessons we will dive deep into the Star Wars data and we'll learn how to transform data sets in various creative and not so creative ways let's get to it this is the first real lesson in which we will use the deployer package for the distracted souls out there the player is part of the tidy verse and we got it when we install the tie diverse ecosystem of packages it specializes in data manipulation tools that deal with filtering mutating and summarizing data first things first let's fire up the Star Wars data frame that comes with deep liar this time I will save it as Star notice that the Digga are saved as a table instead of on our based data frame let's keep it this way and use some of the table properties tables come in handy here because this is a relatively big data set and we don't want to see the entire thing every time we do an operation and print to see our results tables limit the printing to just a few rows okay although we've already looked at it before if you want to see the data in all of its glory run view star this will open the viewer and you can scroll through the values to your heart's content right transforming data the filter function does what we think it does subsets data according to a set of criteria it looks like this we pass the data and then the expression according to which we want our data filtered there can be more than one criteria of course for instance I can select all the droids in the data frame and now I can only call on the ones from Tatooine right yes that makes sense it was young Anakin Skywalker who rebuilt c-3po while still on Tatooine and our 5d for well I'm not sure I know anything about that little r5 unit okay filter also works with logical operators so for example I can call every character that has red orange or yellow as an eye color okay the majority of these aren't human hmm I wonder if there are any more humans with weird eyes apart from Darth Vader and Palpatine No yikes alright next we have the Select function now our database may not have hundreds of variables but looking at the column names it does feel like I genuinely don't need to know about some of these things to narrow down the data to the information I want I can use select this selects specific individual columns by name if I want to select the column and then everything between two other columns I can do this isn't this already a lot easier to do than with the base our functions we learned earlier hmm it is but check this out to select works nicely with a couple of nifty functions like starts with or ends with which let us subset data in a super intuitive way so if I wanted to get all the columns that have to do with coloration I can run this okay new scenario there are a bunch of interesting variables you want to look at but you also don't want to ignore the rest of the data what do you do well you can use the everything function with select to move the variables you want to the beginning of the table and then show everything else like this sweet right finally let's look at the mutate function mutate is the pliers easy way of creating new variables from variables that already exist in the data set for example I can calculate the BMI for our characters because the starwars data has recorded both height and mass information you of course this is largely uninformative because the BMI scale is extremely human centered but you know anything to get the point across now if mutate is the function to use when you want to add a column to your data while also retaining all the other columns in your data frame then transmute is what you will opt for if you only want to keep the variable you create let me show you what I mean see effectively transmitted created my new variable and allowed me to extract it without tagging everything else along as well fantastic ok I will win this lesson here because otherwise I'm at risk of going into way too much detail about side comments I make so thanks for watching everyone and in the next lesson we will pick it right where we left off see you there for more videos like this one please subscribe

Original Description

👉🏻 Download Our Free Data Science Career Guide: https://bit.ly/2DZt6hc 👉🏻 Sign up for Our Complete Data Science Training with 57% OFF: https://bit.ly/2QctScR How to filter, mutate, and summarize a data frame in R using the dplyr package. The filter() function does what we think it does: subsets a data frame according to a set of criteria. It works like this: we pass the data, and then the expression according to which we want or data filtered. There can be more than 1 criteria, of course. Filter() also works with logical operators. The select() function narrows down the data frame to the information you specifically want and need to see. Select() works nicely with a couple of nifty functions like starts_with(), or ends_with(), which let us subset data in a super intuitive way. Mutate() is dplyr’s easy way of creating new variables from variables that already exist in the data frame. For example, if you have height and mass information, you can create a BMI variable. If mutate() is the function to use when you want to add a column to your data frame while also retaining all the other columns in your data frame, then transmute() is what you will opt for if you only want to keep the new variable you create. ► Consider hitting the SUBSCRIBE button if you LIKE the content: https://www.youtube.com/c/365DataScience?sub_confirmation=1 ► VISIT our website: https://bit.ly/365ds 🤝 Connect with us LinkedIn: https://www.linkedin.com/company/365datascience/ 365 Data Science is an online educational career website that offers the incredible opportunity to find your way into the data science world no matter your previous knowledge and experience. We have prepared numerous courses that suit the needs of aspiring BI analysts, Data analysts and Data scientists. We at 365 Data Science are committed educators who believe that curiosity should not be hindered by inability to access good learning resources. This is why we focus all our efforts on creating high-quali
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from 365 Data Science · 365 Data Science · 40 of 60

1 Population vs Sample
Population vs Sample
365 Data Science
2 Data Science & Statistics: Levels of measurement
Data Science & Statistics: Levels of measurement
365 Data Science
3 Statistics Tutorials: Mean, median and mode
Statistics Tutorials: Mean, median and mode
365 Data Science
4 Skewness
Skewness
365 Data Science
5 What is a distribution?
What is a distribution?
365 Data Science
6 The Normal Distribution
The Normal Distribution
365 Data Science
7 Central limit theorem
Central limit theorem
365 Data Science
8 Student's T Distribution
Student's T Distribution
365 Data Science
9 Type I error vs Type II error
Type I error vs Type II error
365 Data Science
10 Hypothesis testing. Null vs alternative
Hypothesis testing. Null vs alternative
365 Data Science
11 The linear regression model
The linear regression model
365 Data Science
12 Simple linear regression model. Geometrical representation
Simple linear regression model. Geometrical representation
365 Data Science
13 INDEX and MATCH application of the two functions separately and combined [Advanced Excel]
INDEX and MATCH application of the two functions separately and combined [Advanced Excel]
365 Data Science
14 INDIRECT Excel Function: How it works and when to use it [Advanced Excel]
INDIRECT Excel Function: How it works and when to use it [Advanced Excel]
365 Data Science
15 VLOOKUP and MATCH another useful functions combination [Advanced Excel]
VLOOKUP and MATCH another useful functions combination [Advanced Excel]
365 Data Science
16 VLOOKUP COLUMN and ROW - Handle large data tables with ease [Advanced Excel]
VLOOKUP COLUMN and ROW - Handle large data tables with ease [Advanced Excel]
365 Data Science
17 The ELIF keyword [Python Fundamentals]
The ELIF keyword [Python Fundamentals]
365 Data Science
18 Working with Tuples in Python
Working with Tuples in Python
365 Data Science
19 Database Terminology - A Beginners Guide
Database Terminology - A Beginners Guide
365 Data Science
20 Relational Database Essentials
Relational Database Essentials
365 Data Science
21 Database vs Spreadsheet - Advantages and Disadvantages
Database vs Spreadsheet - Advantages and Disadvantages
365 Data Science
22 Conditional Statements and Loops
Conditional Statements and Loops
365 Data Science
23 Backpropagation – The Math Behind Optimization
Backpropagation – The Math Behind Optimization
365 Data Science
24 Monte Carlo: Forecasting Stock Prices Part I
Monte Carlo: Forecasting Stock Prices Part I
365 Data Science
25 Monte Carlo: Forecasting Stock Prices Part II
Monte Carlo: Forecasting Stock Prices Part II
365 Data Science
26 Monte Carlo: Forecasting Stock Prices Part III
Monte Carlo: Forecasting Stock Prices Part III
365 Data Science
27 365 Data Science Online Program
365 Data Science Online Program
365 Data Science
28 Data frames - Creating a data frame
Data frames - Creating a data frame
365 Data Science
29 Data Science & Statistics: Slicing a matrix in R
Data Science & Statistics: Slicing a matrix in R
365 Data Science
30 Data frames in R - Exporting data in R
Data frames in R - Exporting data in R
365 Data Science
31 Data frames in R - Transforming data PART II
Data frames in R - Transforming data PART II
365 Data Science
32 Data Frames in R - Subsetting a data frame
Data Frames in R - Subsetting a data frame
365 Data Science
33 Data Science & Statistics: Matrix arithmetic in R
Data Science & Statistics: Matrix arithmetic in R
365 Data Science
34 Data Science & Statistics: Indexing an element from a matrix
Data Science & Statistics: Indexing an element from a matrix
365 Data Science
35 Data Frames in R - Extending a data frame
Data Frames in R - Extending a data frame
365 Data Science
36 Data Science & Statistics: Creating a matrix in R FASTER
Data Science & Statistics: Creating a matrix in R FASTER
365 Data Science
37 Data Science & Statistics: Creating a Matrix in R
Data Science & Statistics: Creating a Matrix in R
365 Data Science
38 Data frames - Importing data in R
Data frames - Importing data in R
365 Data Science
39 Data frames in R - Getting a sense of your data
Data frames in R - Getting a sense of your data
365 Data Science
Data frames in R - Transforming data PART I
Data frames in R - Transforming data PART I
365 Data Science
41 Data frames in R - Import a CSV in R
Data frames in R - Import a CSV in R
365 Data Science
42 Data Science & Statistics: Matrix operations in R
Data Science & Statistics: Matrix operations in R
365 Data Science
43 Data Science & Statistics: Matrix recycling in R
Data Science & Statistics: Matrix recycling in R
365 Data Science
44 Tableau vs Excel: When to use Tableau and when to use Excel
Tableau vs Excel: When to use Tableau and when to use Excel
365 Data Science
45 Download Tableau: Learn how to download Tableau Public
Download Tableau: Learn how to download Tableau Public
365 Data Science
46 Connecting data sources: Useful tips when connecting data sources to Tableau
Connecting data sources: Useful tips when connecting data sources to Tableau
365 Data Science
47 The Tableau interface: See how to navigate through the Tableau interface
The Tableau interface: See how to navigate through the Tableau interface
365 Data Science
48 Tableau data visualization: Create your first Tableau visualization!
Tableau data visualization: Create your first Tableau visualization!
365 Data Science
49 Duplicating sheets: This is how to duplicate a sheet in Tableau
Duplicating sheets: This is how to duplicate a sheet in Tableau
365 Data Science
50 Build a table in Tableau: The steps needed to create a simple table in Tableau
Build a table in Tableau: The steps needed to create a simple table in Tableau
365 Data Science
51 Custom fields in Tableau: Using Tableau operators to create custom fields
Custom fields in Tableau: Using Tableau operators to create custom fields
365 Data Science
52 Custom fields in Tableau: Add calculations to tables through custom fields
Custom fields in Tableau: Add calculations to tables through custom fields
365 Data Science
53 Totals in Tableau: Learn how to display subtotals and totals in Tableau
Totals in Tableau: Learn how to display subtotals and totals in Tableau
365 Data Science
54 Gross Margin calculation in Tableau
Gross Margin calculation in Tableau
365 Data Science
55 What is a filter in Tableau: Set up a filter in Tableau to specify the data you want to show
What is a filter in Tableau: Set up a filter in Tableau to specify the data you want to show
365 Data Science
56 Joins in Tableau: Inner, outer, left, or a right join in Tableau
Joins in Tableau: Inner, outer, left, or a right join in Tableau
365 Data Science
57 Building a Tableau dashboard: Three types of charts you want to have in a Tableau dashboard
Building a Tableau dashboard: Three types of charts you want to have in a Tableau dashboard
365 Data Science
58 Creating great looking charts in Tableau: Real life Exercise on charts in Tableau
Creating great looking charts in Tableau: Real life Exercise on charts in Tableau
365 Data Science
59 Joins in Tableau: Choose the correct join type
Joins in Tableau: Choose the correct join type
365 Data Science
60 How to make a data check in Tableau: A quick data check is better than no data check
How to make a data check in Tableau: A quick data check is better than no data check
365 Data Science

This video teaches how to transform data in R using the dplyr package, covering the filter, select, and mutate functions. It provides hands-on examples using the Star Wars data set.

Key Takeaways
  1. Load the dplyr package
  2. Create a data frame
  3. Use the filter function to subset data
  4. Use the select function to choose specific columns
  5. Use the mutate function to create new variables
  6. Use the transmute function to extract specific variables
💡 The dplyr package provides an efficient and intuitive way to transform and manipulate data in R.

Related AI Lessons

Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Learn how neural geometry relies on manifolds, projections, and hidden assumptions to understand complex data, and why it matters for AI development
Medium · AI
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Learn how neural geometry relies on manifolds, projections, and hidden assumptions to understand complex data, and why it matters for advancing AI research
Medium · Data Science
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Explore the geometric assumptions underlying neural networks and their implications on manifold learning and projections
Medium · Deep Learning
Beyond the Elephant: On Manifolds, Projections, and the Hidden Assumptions of Neural Geometry
Learn about the hidden assumptions of neural geometry and how manifolds and projections impact neural network performance
Medium · LLM
Up next
Machine Learning Project for Final Year Students | ML Project Idea @FameWorldEducationalHub
FAME WORLD EDUCATIONAL HUB
Watch →