R Tutorial: Visualizing subsets
Key Takeaways
Visualizes subsets using Trelliscope in R
Full Transcript
we have seen that visualizing summaries can help us discover and describe overall trends and relationships in the data in this section we will explore some examples of how we can complement summary visualizations with detailed visualizations of smaller subsets of our data while summary visualizations can be very revealing sometimes important insights are covered up in the summarization and we need to look at the data in more detail to discover them for example here we have a summary visualization of the annual return for four stocks from the summary it appears all four stocks had a similar year a return of around 13% however looking at detailed plots of the daily prices we see a very different year for each stock we'll have more fun with this stock data in chapter 3 visualizing large data in detail is challenging because there's too much data to look at a useful technique in this case is to take a manageable subset of the data that has some natural meaning such as for example all data for one stock and visualize and explore as we saw in one of our previous exercises the distribution of the tip amount is zero for all payment types but credit card this is an interesting phenomenon that we want to get to the bottom of with cash payments does the taxi payment system not distinguish between tips and fare where does the total fare amount just not include the amount that was tipped to investigate this question we turn to detailed visualization of a subset of our data we expect rides of the same nature to have similar fare and tip amounts therefore if we can pull out a subset of our data for similar routes we can compare the distributions of fare and tip amount to investigate our question we expect the distributions of total fare for rides paid with cash and card to look similar if both cases include tips here we have extracted a subset of the data for the most popular route from the Upper East Side south to the Upper East Side north of man looking only at these trips and only at cash and credit transactions we have about 5,000 observations let's do a check to ensure that this subset is well-behaved looking at the relationship between total fare versus trip duration we expect the relationship to be cleaner since we are focusing on one simple route even with the data this small we are still over plotting many points and we can alleviate this to a degree using the Alpha parameter to add transparency to the points this looks much cleaner than what we saw for all routes to compare the distribution of payments using card versus cash we can use a quantile plot this displays the ordered values of the data against the quantiles of a uniform distribution and is often more useful than a histogram for comparing distributions we create a quantile plot using GG plot to use geum specifying that the data should be plotted against the uniform distribution in this plot we see that the card and cash distributions have a similar shape but are shifted we also see that the cash payments are made up of several discrete values while card payments are more continuous which we wouldn't be able to see in a histogram due to binning from this we can reasonably conclude that tips are not included in the total reported fare amount for cash payments in the exercise we will see if the two distributions are similar if we remove tips from both let's go
Original Description
Want to learn more? Take the full course at https://learn.datacamp.com/courses/visualizing-big-data-with-trelliscope-in-r at your own pace. More than a video, you'll learn hands-on coding & quickly apply skills to your daily work.
---
We have seen that visualizing summaries can help us discover and describe overall trends and relationships in the data. In this section, we will explore some examples of how we can complement summary visualizations with detailed visualizations of smaller subsets of our data.
While summary visualizations can be very revealing, sometimes important insights are covered up in the summarization and we need to look at the data in more detail to discover them.
For example, here we have a summary visualization of the annual return for four stocks. From the summary, it appears all four stocks had a similar year - a return of about 13%.
However, looking at detailed plots of the daily prices, we see a very different year for each stock. We'll have more fun with this stock data in chapter 3.
Visualizing large data in detail is challenging because there's too much data to look at! A useful technique in this case is to take a manageable subset of the data that has some natural meaning (such as, for example, all data for one stock), and visualize and explore.
As we saw in one of our previous exercises, the distribution of the tip amount is zero for all payment types but credit card.
This is an interesting phenomenon that we want to get to the bottom of. With cash payments, does the taxi payment system not distinguish between tips and fare? Or does the total fare amount just not include the amount that was tipped?
To investigate this question, we turn to detailed visualization of a subset of our data.
We expect rides of the same nature to have similar fare and tip amounts. Therefore, if we can pull out a subset of our data for similar routes, we can compare the distributions of fare and tip amount to investigate our question. We expect the
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from DataCamp · DataCamp · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
SQL Server Tutorial: Date manipulation
DataCamp
R Tutorial: Intermediate Interactive Data Visualization with plotly in R
DataCamp
R Tutorial: Adding aesthetics to represent a variable
DataCamp
R Tutorial: Moving Beyond Simple Interactivity
DataCamp
Python Tutorial: Why use ML for marketing? Strategies and use cases
DataCamp
Python Tutorial: Preparation for modeling
DataCamp
Python Tutorial: Machine Learning modeling steps
DataCamp
R Tutorial: The prior model
DataCamp
R Tutorial: Data & the likelihood
DataCamp
R Tutorial: The posterior model
DataCamp
R Tutorial: An Introduction to plotly
DataCamp
R Tutorial: Plotting a single variable
DataCamp
R Tutorial: Bivariate graphics
DataCamp
Python Tutorial: Customer Segmentation in Python
DataCamp
Python Tutorial: Time cohorts
DataCamp
Python Tutorial: Calculate cohort metrics
DataCamp
Python Tutorial: Cohort analysis visualization
DataCamp
R Tutorial: Building Dashboards with flexdashboard
DataCamp
R Tutorial: Anatomy of a flexdashboard
DataCamp
R Tutorial: Layout basics
DataCamp
R Tutorial: Advanced layouts
DataCamp
Python Tutorial: Time Series Analysis in Python
DataCamp
Python Tutorial: Correlation of Two Time Series
DataCamp
Python Tutorial: Simple Linear Regressions
DataCamp
Python Tutorial: Autocorrelation
DataCamp
R Tutorial: The gapminder dataset
DataCamp
R Tutorial: The filter verb
DataCamp
R Tutorial: The arrange verb
DataCamp
R Tutorial: The mutate verb
DataCamp
R Tutorial: What is cluster analysis?
DataCamp
R Tutorial: Distance between two observations
DataCamp
R Tutorial: The importance of scale
DataCamp
R Tutorial: Measuring distance for categorical data
DataCamp
Python Tutorial: Plotting multiple graphs
DataCamp
Python Tutorial: Customizing axes
DataCamp
Python Tutorial: Legends, annotations, & styles
DataCamp
Python Tutorial: Introduction to iterators
DataCamp
Python Tutorial: Playing with iterators
DataCamp
Python Tutorial: Using iterators to load large files into memory
DataCamp
SQL Tutorial: Introduction to Relational Databases in SQL
DataCamp
SQL Tutorial: Tables: At the core of every database
DataCamp
SQL Tutorial: Update your database as the structure changes
DataCamp
Python Tutorial: Classification-Tree Learning
DataCamp
Python Tutorial: Decision-Tree for Classification
DataCamp
Python Tutorial: Decision-Tree for Regression
DataCamp
Python Tutorial: Census Subject Tables
DataCamp
Python Tutorial: Census Geography
DataCamp
Python Tutorial: Using the Census API
DataCamp
R Tutorial: A/B Testing in R
DataCamp
R Tutorial: Baseline Conversion Rates
DataCamp
R Tutorial: Designing an Experiment - Power Analysis
DataCamp
R Tutorial: Introduction to qualitative data
DataCamp
R Tutorial: Understanding your qualitative variables
DataCamp
R Tutorial: Making Better Plots
DataCamp
SQL Tutorial: OLTP and OLAP
DataCamp
SQL Tutorial: Storing data
DataCamp
SQL Tutorial: Database design
DataCamp
Python Tutorial: Introduction to spaCy
DataCamp
Python Tutorial: Statistical Models
DataCamp
Python Tutorial: Rule-based Matching
DataCamp
More on: Data Literacy
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
X now offers an MCP server to make its platform easier for AI tools to use
TechCrunch AI
n8n Automation Repurpose Video Content: The 2025 Production Guide
Dev.to AI
You’re Still Paying $200/Month for AI Tools You Could Replace With a Free Local Setup Tonight
Medium · Data Science
Top 10 AI Tools Every College Student Should Know in 2026
Medium · AI
🎓
Tutor Explanation
DeepCamp AI