Statistical Learning: 12.R.3 Hierarchical Clustering

Stanford Online · Beginner ·📐 ML Fundamentals ·3y ago
Statistical Learning, featuring Deep Learning, Survival Analysis and Multiple Testing Trevor Hastie, Professor of Statistics and Biomedical Data Sciences at Stanford University - https://statistics.stanford.edu/people/trevor-j-hastie Robert Tibshirani, Professor of Statistics and Biomedical Data Sciences at Stanford University - https://statistics.stanford.edu/people/robert-tibshirani Jonathan Taylor, Professor Statistics at Stanford University - https://statistics.stanford.edu/people/jonathan-taylor You are able to take Statistical Learning as an online course on EdX, and you are able to choose a verified path and get a certificate for its completion. You can choose to take the course in R (https://www.edx.org/course/statistica) or in Python (https://www.edx.org/learn/data-analysis-statistics/stanford-university-statistical-learning-with-python) For more information about courses on Statistics, you can browse our Stanford Online Catalog: https://stanford.io/3QHRi72

What You'll Learn

This video covers hierarchical clustering, a form of clustering that works off a distance matrix, using the hclust tool in R, with methods such as complete, single linkage, and average linkage.

Full Transcript

finally we're going to get to hierarchical clustering which is another form of clustering but different and this works off a distance matrix so we'll use our same data yet and we'll compute distance of x which computes a 100 by 100 pairwise distance matrix and we make a call to h-clust so h-cluster is the tool that we use for doing hierarchical clustering so by now you become pretty adept at r if you want to find out more about h clust you just go help hclest and you'll get help online um if you once you've computed the the foot if you you can you know how to print it out and you know if all else fails if something else you'd like to find out searching the web is a really good idea these days if you search the web for related queries tons of stuff comes up and you can often find out all kinds of interesting information so here we ran h clust with method equals complete and we'll just plot the data and it plots a dendrogram that shows the clustering so if you recall this is a bottom-up clustering technique where it continuously joins together smaller clusters to make bigger clusters until eventually you get to one big cluster now we know there's four natural clusters in these data and from the heights of the arms of the dendogram it's evident that there are four big clusters here and these will almost certainly correspond to the original divisions in the data that we created complete is is is how it decides how close two clusters are and what complete does is it uses the largest pairwise distance between a point in one cluster and a point in another cluster um there's other methods we can use so single linkage clustering it's called it's the same call except method equals single so instead of looking at using the largest distance it uses the smallest distance and it gives you a rather different looking picture and the four big groups that we saw aren't as easily evident in this one it seems to have found one two three groups and then somewhere in here is a division into the fourth group so single linkage clustering tends to to to to find long strung out clusters because it's just looking at the closest point between clusters and so you might not get nice clumpy balls like like we expect to see in this case average is somewhere in between and indeed the plot for average looks somewhere in between but i think for these data i think we probably prefer the method equals complete so we're going to we're going to compare with the with the actual clusters in the data again we'll use a function cut tree and we'll cut it at level four okay so that's the first thing we'll do and and so that cuts our original cluster tree um let's just plot it let's get get its plot up there again here's the dendrogram so we've cut it at level four so that means at the point where there's four clusters so now we're going to identify who's in each of these clusters and for that we use the function catri and all cut tree does is you tell it the level that you want which is for you the number of classes you want it's for and it gives you back a vector of cluster assignments so now we can use the table function to tabulate those assignments with the the what we know to be the real assignments which is given by a variable which and so you get this little table here again the orders are arbitrary here and so what you expect to see is some big numbers with zeros elsewhere right so the big numbers are the 17 31 30 and 19 and then the small numbers are the mis the mis identification so there were three misidentification here and we can also do a tabulation and compare it with k-means clustering and with k means clustering there were two two mis two disagreements so both k-means and hierarchical clustering um didn't know the two assignments and they agreed mostly but on two points they disagreed now you can there's there's fancy ways that you can plot hierarchical clustering trees that that actually tell you that color that say the leafs according to um the original cluster membership or some other variables these tend to be rather technical i i actually looked to see how to do this by searching on the web and even though i was able i was able to do it it's probably too technical to put in this demonstration but if that's something you wanted to to do i i urge you to have a look so people have written functions for doing that what we'll do is something a little simpler the the plot command for for dendrograms for for cluster trees has a labels argument and we're going to label it according to the original cluster assignment and if you can see here you'll see that all these guys down here are fours there's twos there's ones and and threes it's not that clear but if we had a bigger place to plot you'd be able to see which which of the points were were misassigned so again use the help help to find out more about hierarchical clustering and and again if all else fails go and search on the web and once again this is a our markdown document in rstudio so we can knit it and then we get a nice html document which gives a summary of of our results and and with any figures we made in in one nice document and we got our figures this is something you can share with some colleagues or to show the analysis that you've done for them or for the demonstration so there we go
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Stanford Online · Stanford Online · 3 of 60

1 Statistical Learning: 13.2 Introduction to Multiple Testing and Family Wise Error Rate
Statistical Learning: 13.2 Introduction to Multiple Testing and Family Wise Error Rate
Stanford Online
2 Statistical Learning: 13.1 Introduction to Hypothesis Testing II
Statistical Learning: 13.1 Introduction to Hypothesis Testing II
Stanford Online
Statistical Learning: 12.R.3 Hierarchical Clustering
Statistical Learning: 12.R.3 Hierarchical Clustering
Stanford Online
4 Statistical Learning: 12.R.2 K means Clustering
Statistical Learning: 12.R.2 K means Clustering
Stanford Online
5 Statistical Learning: 12.R.1 Principal Components
Statistical Learning: 12.R.1 Principal Components
Stanford Online
6 Statistical Learning: 13.R.1 Bonferroni and Holm II
Statistical Learning: 13.R.1 Bonferroni and Holm II
Stanford Online
7 Statistical Learning: 12.6 Breast Cancer Example
Statistical Learning: 12.6 Breast Cancer Example
Stanford Online
8 Statistical Learning: 12.5 Matrix Completion
Statistical Learning: 12.5 Matrix Completion
Stanford Online
9 Statistical Learning: 12.4 Hierarchical Clustering
Statistical Learning: 12.4 Hierarchical Clustering
Stanford Online
10 Statistical Learning: 12.3 k means Clustering
Statistical Learning: 12.3 k means Clustering
Stanford Online
11 Statistical Learning: 13.1 Introduction to Hypothesis Testing
Statistical Learning: 13.1 Introduction to Hypothesis Testing
Stanford Online
12 Stanford Seminar - Introduction to Web3
Stanford Seminar - Introduction to Web3
Stanford Online
13 Stanford Seminar - Designing Equitable Online Experiences
Stanford Seminar - Designing Equitable Online Experiences
Stanford Online
14 Stanford CS330: Deep Multi-Task & Meta Learning I 2021 I Lecture 1
Stanford CS330: Deep Multi-Task & Meta Learning I 2021 I Lecture 1
Stanford Online
15 Stanford Seminar - Perceiving, Understanding, and Interacting through Touch
Stanford Seminar - Perceiving, Understanding, and Interacting through Touch
Stanford Online
16 Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 2
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 2
Stanford Online
17 Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 3
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 3
Stanford Online
18 Stanford CS330: Deep Multi-Task & Meta Learning I 2021 I Lecture 4
Stanford CS330: Deep Multi-Task & Meta Learning I 2021 I Lecture 4
Stanford Online
19 Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 5
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 5
Stanford Online
20 Stanford Seminar - Evolution of a Web3 Company
Stanford Seminar - Evolution of a Web3 Company
Stanford Online
21 Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 6
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 6
Stanford Online
22 Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 7
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 7
Stanford Online
23 Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 8
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 8
Stanford Online
24 Stanford Seminar - Designing Human-Centered AI Systems for Human-AI Collaboration
Stanford Seminar - Designing Human-Centered AI Systems for Human-AI Collaboration
Stanford Online
25 The Sh*tFixers: Bob Sutton Interviews David Kelley, Design Thinking Superstar
The Sh*tFixers: Bob Sutton Interviews David Kelley, Design Thinking Superstar
Stanford Online
26 Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 9
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 9
Stanford Online
27 Women Rise: Sheri Sheppard
Women Rise: Sheri Sheppard
Stanford Online
28 Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 10
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 10
Stanford Online
29 Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 11
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 11
Stanford Online
30 Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 12
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 12
Stanford Online
31 Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 13
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 13
Stanford Online
32 Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 14
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 14
Stanford Online
33 Stanford Webinar - Cloud Computing: What’s on the Horizon with Dr. Timothy Chou
Stanford Webinar - Cloud Computing: What’s on the Horizon with Dr. Timothy Chou
Stanford Online
34 Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 15
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 15
Stanford Online
35 Stanford Seminar - Multi-Sensory Neural Objects: Modeling, Inference, and Applications in Robotics
Stanford Seminar - Multi-Sensory Neural Objects: Modeling, Inference, and Applications in Robotics
Stanford Online
36 Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 16
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 16
Stanford Online
37 Stanford Seminar - Toward Better Human-AI Group Decisions
Stanford Seminar - Toward Better Human-AI Group Decisions
Stanford Online
38 Stanford CS330: Deep Multi-Task & Meta Learning I 2021 I Lecture 17
Stanford CS330: Deep Multi-Task & Meta Learning I 2021 I Lecture 17
Stanford Online
39 Stanford CS330: Deep Multi-Task & Meta Learning I 2021 I Lecture 18
Stanford CS330: Deep Multi-Task & Meta Learning I 2021 I Lecture 18
Stanford Online
40 Stanford Webinar - Web3 Considered: Possible Futures for Decentralization and Digital Ownership
Stanford Webinar - Web3 Considered: Possible Futures for Decentralization and Digital Ownership
Stanford Online
41 Stanford Seminar - Ethics Governance-in-the-Making: Bridging Ethics Work & Governance Menlo Report
Stanford Seminar - Ethics Governance-in-the-Making: Bridging Ethics Work & Governance Menlo Report
Stanford Online
42 Stanford Seminar -  Towards Generalizable Autonomy: Duality of Discovery & Bias
Stanford Seminar - Towards Generalizable Autonomy: Duality of Discovery & Bias
Stanford Online
43 Stanford Seminar - ML Explainability Part 1 I Overview and Motivation for Explainability
Stanford Seminar - ML Explainability Part 1 I Overview and Motivation for Explainability
Stanford Online
44 Stanford Seminar - ML Explainability Part 2 I Inherently Interpretable Models
Stanford Seminar - ML Explainability Part 2 I Inherently Interpretable Models
Stanford Online
45 Stanford Seminar - ML Explainability Part 3 I Post hoc Explanation Methods
Stanford Seminar - ML Explainability Part 3 I Post hoc Explanation Methods
Stanford Online
46 Kratika Gupta talks about Stanford's Product Management Program
Kratika Gupta talks about Stanford's Product Management Program
Stanford Online
47 Stanford Seminar - Making Teamwork an Objective Discipline - Sid Sijbrandij CEO & Chairman of GitLab
Stanford Seminar - Making Teamwork an Objective Discipline - Sid Sijbrandij CEO & Chairman of GitLab
Stanford Online
48 Stanford Seminar - ML Explainability Part 4 I Evaluating Model Interpretations/Explanations
Stanford Seminar - ML Explainability Part 4 I Evaluating Model Interpretations/Explanations
Stanford Online
49 Stanford Seminar - Adaptable Robotic Manipulation Using Tactile Sensors
Stanford Seminar - Adaptable Robotic Manipulation Using Tactile Sensors
Stanford Online
50 Stanford Seminar - ML Explainability Part 5 I Future of Model Understanding
Stanford Seminar - ML Explainability Part 5 I Future of Model Understanding
Stanford Online
51 Meet Joe Lapin, Innovation and Entrepreneurship Program Completer
Meet Joe Lapin, Innovation and Entrepreneurship Program Completer
Stanford Online
52 Stanford Seminar: Social Media Scrutiny of Frontline Professionals & Implications for Accountability
Stanford Seminar: Social Media Scrutiny of Frontline Professionals & Implications for Accountability
Stanford Online
53 Stanford Seminar - Alphy and Alphy Reflect: creating a reflective mirror to advance women
Stanford Seminar - Alphy and Alphy Reflect: creating a reflective mirror to advance women
Stanford Online
54 Stanford Webinar - The Digital Future of Health
Stanford Webinar - The Digital Future of Health
Stanford Online
55 Stanford CS229M - Lecture 1: Overview, supervised learning, empirical risk minimization
Stanford CS229M - Lecture 1: Overview, supervised learning, empirical risk minimization
Stanford Online
56 Stanford CS229M - Lecture 2:  Asymptotic analysis, uniform convergence, Hoeffding inequality
Stanford CS229M - Lecture 2: Asymptotic analysis, uniform convergence, Hoeffding inequality
Stanford Online
57 Stanford CS229M - Lecture 3: Finite hypothesis class, discretizing infinite hypothesis space
Stanford CS229M - Lecture 3: Finite hypothesis class, discretizing infinite hypothesis space
Stanford Online
58 Stanford Seminar - Decentralized Finance (DeFi)
Stanford Seminar - Decentralized Finance (DeFi)
Stanford Online
59 Stanford CS229M - Lecture 4: Advanced concentration inequalities
Stanford CS229M - Lecture 4: Advanced concentration inequalities
Stanford Online
60 Stanford Seminar - Bridging AI & HCI: Incorporating Human Values into the Development of AI Tech
Stanford Seminar - Bridging AI & HCI: Incorporating Human Values into the Development of AI Tech
Stanford Online

This video teaches hierarchical clustering using the hclust tool in R, covering methods such as complete, single linkage, and average linkage, and how to interpret dendrograms and compare clustering results.

Key Takeaways
  1. Compute a distance matrix using the dist() function in R
  2. Apply hierarchical clustering using the hclust() function
  3. Plot the dendrogram using the plot() function
  4. Cut the tree at a specified level using the cutree() function
  5. Tabulate cluster assignments using the table() function
  6. Compare with k-means clustering results
💡 Hierarchical clustering can be used to identify clusters in a dataset, and the choice of linkage method (e.g. complete, single, average) can affect the results.

Related AI Lessons

I Almost Quit Java After My First Project (Then One Bug Changed Everything)
A Java developer shares how overcoming a single bug changed their approach to coding and improved their skills
Medium · Python
FastAPI for Production AI: From Notebook to Scalable APIs
Learn to deploy machine learning models to production using FastAPI, bridging the gap from local scripts to scalable APIs
Dev.to AI
Is BMAML correct decision, and how can one implement it?
Learn how to implement Bayesian Model-Agnostic Meta-Learning (BMAML) and decide if it's the correct choice for your project
Reddit r/deeplearning
Easiest Way to Understand Machine Learning Concepts
Learn the easiest way to understand machine learning concepts and get ready for any machine learning interview
Medium · Machine Learning
Up next
Learn Deep Learning by Hand (Beginner's Guide - Part 1)
Thu Vu
Watch →