Statistical Learning: 12.R.3 Hierarchical Clustering
Statistical Learning, featuring Deep Learning, Survival Analysis and Multiple Testing
Trevor Hastie, Professor of Statistics and Biomedical Data Sciences at Stanford University - https://statistics.stanford.edu/people/trevor-j-hastie
Robert Tibshirani, Professor of Statistics and Biomedical Data Sciences at Stanford University - https://statistics.stanford.edu/people/robert-tibshirani
Jonathan Taylor, Professor Statistics at Stanford University - https://statistics.stanford.edu/people/jonathan-taylor
You are able to take Statistical Learning as an online course on EdX, and you are able to choose a verified path and get a certificate for its completion. You can choose to take the course in R (https://www.edx.org/course/statistica) or in Python (https://www.edx.org/learn/data-analysis-statistics/stanford-university-statistical-learning-with-python)
For more information about courses on Statistics, you can browse our Stanford Online Catalog: https://stanford.io/3QHRi72
What You'll Learn
This video covers hierarchical clustering, a form of clustering that works off a distance matrix, using the hclust tool in R, with methods such as complete, single linkage, and average linkage.
Full Transcript
finally we're going to get to hierarchical clustering which is another form of clustering but different and this works off a distance matrix so we'll use our same data yet and we'll compute distance of x which computes a 100 by 100 pairwise distance matrix and we make a call to h-clust so h-cluster is the tool that we use for doing hierarchical clustering so by now you become pretty adept at r if you want to find out more about h clust you just go help hclest and you'll get help online um if you once you've computed the the foot if you you can you know how to print it out and you know if all else fails if something else you'd like to find out searching the web is a really good idea these days if you search the web for related queries tons of stuff comes up and you can often find out all kinds of interesting information so here we ran h clust with method equals complete and we'll just plot the data and it plots a dendrogram that shows the clustering so if you recall this is a bottom-up clustering technique where it continuously joins together smaller clusters to make bigger clusters until eventually you get to one big cluster now we know there's four natural clusters in these data and from the heights of the arms of the dendogram it's evident that there are four big clusters here and these will almost certainly correspond to the original divisions in the data that we created complete is is is how it decides how close two clusters are and what complete does is it uses the largest pairwise distance between a point in one cluster and a point in another cluster um there's other methods we can use so single linkage clustering it's called it's the same call except method equals single so instead of looking at using the largest distance it uses the smallest distance and it gives you a rather different looking picture and the four big groups that we saw aren't as easily evident in this one it seems to have found one two three groups and then somewhere in here is a division into the fourth group so single linkage clustering tends to to to to find long strung out clusters because it's just looking at the closest point between clusters and so you might not get nice clumpy balls like like we expect to see in this case average is somewhere in between and indeed the plot for average looks somewhere in between but i think for these data i think we probably prefer the method equals complete so we're going to we're going to compare with the with the actual clusters in the data again we'll use a function cut tree and we'll cut it at level four okay so that's the first thing we'll do and and so that cuts our original cluster tree um let's just plot it let's get get its plot up there again here's the dendrogram so we've cut it at level four so that means at the point where there's four clusters so now we're going to identify who's in each of these clusters and for that we use the function catri and all cut tree does is you tell it the level that you want which is for you the number of classes you want it's for and it gives you back a vector of cluster assignments so now we can use the table function to tabulate those assignments with the the what we know to be the real assignments which is given by a variable which and so you get this little table here again the orders are arbitrary here and so what you expect to see is some big numbers with zeros elsewhere right so the big numbers are the 17 31 30 and 19 and then the small numbers are the mis the mis identification so there were three misidentification here and we can also do a tabulation and compare it with k-means clustering and with k means clustering there were two two mis two disagreements so both k-means and hierarchical clustering um didn't know the two assignments and they agreed mostly but on two points they disagreed now you can there's there's fancy ways that you can plot hierarchical clustering trees that that actually tell you that color that say the leafs according to um the original cluster membership or some other variables these tend to be rather technical i i actually looked to see how to do this by searching on the web and even though i was able i was able to do it it's probably too technical to put in this demonstration but if that's something you wanted to to do i i urge you to have a look so people have written functions for doing that what we'll do is something a little simpler the the plot command for for dendrograms for for cluster trees has a labels argument and we're going to label it according to the original cluster assignment and if you can see here you'll see that all these guys down here are fours there's twos there's ones and and threes it's not that clear but if we had a bigger place to plot you'd be able to see which which of the points were were misassigned so again use the help help to find out more about hierarchical clustering and and again if all else fails go and search on the web and once again this is a our markdown document in rstudio so we can knit it and then we get a nice html document which gives a summary of of our results and and with any figures we made in in one nice document and we got our figures this is something you can share with some colleagues or to show the analysis that you've done for them or for the demonstration so there we go
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Stanford Online · Stanford Online · 3 of 60
1
2
▶
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Statistical Learning: 13.2 Introduction to Multiple Testing and Family Wise Error Rate
Stanford Online
Statistical Learning: 13.1 Introduction to Hypothesis Testing II
Stanford Online
Statistical Learning: 12.R.3 Hierarchical Clustering
Stanford Online
Statistical Learning: 12.R.2 K means Clustering
Stanford Online
Statistical Learning: 12.R.1 Principal Components
Stanford Online
Statistical Learning: 13.R.1 Bonferroni and Holm II
Stanford Online
Statistical Learning: 12.6 Breast Cancer Example
Stanford Online
Statistical Learning: 12.5 Matrix Completion
Stanford Online
Statistical Learning: 12.4 Hierarchical Clustering
Stanford Online
Statistical Learning: 12.3 k means Clustering
Stanford Online
Statistical Learning: 13.1 Introduction to Hypothesis Testing
Stanford Online
Stanford Seminar - Introduction to Web3
Stanford Online
Stanford Seminar - Designing Equitable Online Experiences
Stanford Online
Stanford CS330: Deep Multi-Task & Meta Learning I 2021 I Lecture 1
Stanford Online
Stanford Seminar - Perceiving, Understanding, and Interacting through Touch
Stanford Online
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 2
Stanford Online
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 3
Stanford Online
Stanford CS330: Deep Multi-Task & Meta Learning I 2021 I Lecture 4
Stanford Online
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 5
Stanford Online
Stanford Seminar - Evolution of a Web3 Company
Stanford Online
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 6
Stanford Online
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 7
Stanford Online
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 8
Stanford Online
Stanford Seminar - Designing Human-Centered AI Systems for Human-AI Collaboration
Stanford Online
The Sh*tFixers: Bob Sutton Interviews David Kelley, Design Thinking Superstar
Stanford Online
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 9
Stanford Online
Women Rise: Sheri Sheppard
Stanford Online
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 10
Stanford Online
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 11
Stanford Online
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 12
Stanford Online
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 13
Stanford Online
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 14
Stanford Online
Stanford Webinar - Cloud Computing: What’s on the Horizon with Dr. Timothy Chou
Stanford Online
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 15
Stanford Online
Stanford Seminar - Multi-Sensory Neural Objects: Modeling, Inference, and Applications in Robotics
Stanford Online
Stanford CS330: Deep Multi-task & Meta Learning I 2021 I Lecture 16
Stanford Online
Stanford Seminar - Toward Better Human-AI Group Decisions
Stanford Online
Stanford CS330: Deep Multi-Task & Meta Learning I 2021 I Lecture 17
Stanford Online
Stanford CS330: Deep Multi-Task & Meta Learning I 2021 I Lecture 18
Stanford Online
Stanford Webinar - Web3 Considered: Possible Futures for Decentralization and Digital Ownership
Stanford Online
Stanford Seminar - Ethics Governance-in-the-Making: Bridging Ethics Work & Governance Menlo Report
Stanford Online
Stanford Seminar - Towards Generalizable Autonomy: Duality of Discovery & Bias
Stanford Online
Stanford Seminar - ML Explainability Part 1 I Overview and Motivation for Explainability
Stanford Online
Stanford Seminar - ML Explainability Part 2 I Inherently Interpretable Models
Stanford Online
Stanford Seminar - ML Explainability Part 3 I Post hoc Explanation Methods
Stanford Online
Kratika Gupta talks about Stanford's Product Management Program
Stanford Online
Stanford Seminar - Making Teamwork an Objective Discipline - Sid Sijbrandij CEO & Chairman of GitLab
Stanford Online
Stanford Seminar - ML Explainability Part 4 I Evaluating Model Interpretations/Explanations
Stanford Online
Stanford Seminar - Adaptable Robotic Manipulation Using Tactile Sensors
Stanford Online
Stanford Seminar - ML Explainability Part 5 I Future of Model Understanding
Stanford Online
Meet Joe Lapin, Innovation and Entrepreneurship Program Completer
Stanford Online
Stanford Seminar: Social Media Scrutiny of Frontline Professionals & Implications for Accountability
Stanford Online
Stanford Seminar - Alphy and Alphy Reflect: creating a reflective mirror to advance women
Stanford Online
Stanford Webinar - The Digital Future of Health
Stanford Online
Stanford CS229M - Lecture 1: Overview, supervised learning, empirical risk minimization
Stanford Online
Stanford CS229M - Lecture 2: Asymptotic analysis, uniform convergence, Hoeffding inequality
Stanford Online
Stanford CS229M - Lecture 3: Finite hypothesis class, discretizing infinite hypothesis space
Stanford Online
Stanford Seminar - Decentralized Finance (DeFi)
Stanford Online
Stanford CS229M - Lecture 4: Advanced concentration inequalities
Stanford Online
Stanford Seminar - Bridging AI & HCI: Incorporating Human Values into the Development of AI Tech
Stanford Online
More on: Unsupervised Learning
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Almost Quit Java After My First Project (Then One Bug Changed Everything)
Medium · Python
FastAPI for Production AI: From Notebook to Scalable APIs
Dev.to AI
Is BMAML correct decision, and how can one implement it?
Reddit r/deeplearning
Easiest Way to Understand Machine Learning Concepts
Medium · Machine Learning
🎓
Tutor Explanation
DeepCamp AI