Unsupervised Learning

Siraj Raval · Beginner ·🛠️ AI Tools & Apps ·7y ago

Key Takeaways

This video by Siraj Raval demonstrates unsupervised learning techniques, specifically Principal Component Analysis (PCA) and K-Means Clustering, using Python and scikit-learn to find hidden patterns in unstructured data.

Full Transcript

guys I think I found it classified tree of life hello world it's Suraj and the most exciting class of machine learning techniques is called unsupervised learning teaching machines to learn for themselves without having to be explicitly told if everything they do is right or wrong is the key to true artificial intelligence and perhaps the most important research goal of 2019 I mean how else are we going to get to fully automated luxury day space communism in this episode I'm going to give you a broad overview of this area as well as teach you two of the most popular unsupervised learning techniques principal component analysis and k-means clustering in order to save someone's life a patient at a hospital has been suffering from several epilepsy related seizures luckily we have a data set of their neural activity recorded by electrodes that were inserted into their brain the lead surgeon asked us to use unsupervised learning techniques on this neural data to find out what part of their brain is causing the seizures so they can perform surgery on it will we save the patient's life we'll find out at the end of this video and subscribe if you want to keep learning about AI technology for free we can divide machine learning into two types supervised and unsupervised there's also reinforcement learning but that only applies in a real-time environment it's not a static data spreadsheet there's also quantum machine look can you please keep it simple for once so supervised learning is synonymous with pattern matching it's done using the ground truth meaning we have prior knowledge of what the output values for our input data should be you know that hot dog not hot dog classifier trope from the popular show Silicon Valley that's supervised learning my life is literally that show so I don't watch it the goal is to approximate the relationship between input and output data most machine learning across every industry is done this way it's easy it's straightforward and it tends to perform very well if given enough samples but clean perfectly labeled datasets aren't always easy to find in fact 80% of the world's data is unstructured the goal of unsupervised learning is to automatically find structure in a data set this can itself be the goal discovering hidden patterns in data or a means to an end to learn what the most relevant features are we can further subdivide unsupervised learning into different types of techniques clustering finds data points similar to each other and groups them together if we had any kind of population data whether we were a government organization or a start-up with a product like diet water yes that's real basically anyone trying to reach a certain set of people we want to segment that population into smaller clusters with similar demographics and purchasing habits so that we could target them most effectively spending our marketing budget anomaly detection finds the outliers in a collection of data points banks uses to find fraudulent transactions Association finds correlated features between data points then lets us infer other features of a given data point Airbnb uses to recommend other listings you probably like and dimensionality reduction reduces the number of features in a data set which makes it easier to visualize and interpret Yamla director of AI research at Facebook puts it best with his quote if intelligence was a cake unsupervised learning would be the cake supervised learning would be the icing on the cake and reinforcement learning would be the cherry on the cake we now know how to make the icing and the cherry but we don't know how to make the cake talk about strange but weirdly effective metaphors here's to you young so let's take a look at our data to decide what to do with it this is a 30-minute long recording of neural data from an epilepsy patient a set of electrodes were inserted into the brain of this patient to record the activity of neurons in real-time it picked up electrical spikes of neurons and we can see several features here that relate to the recording devices measurements like the channel number frequency and the number of samples let's first visualize this data using Digital alchemy aka Python we want to extract spikes from the signal and to do that we'll find data points in the signal that are above some predefined threshold and align them at their peak amplitude we can do this with just 100 random spikes and see that there are at least two types of waveforms in the data one group of spikes with a sharp high amplitude peak and a second group with a broader initial peak these bites were likely generated by more than one neuron if we can find a way to group these waveforms into different clusters it will help us figure out which spike corresponds to which neurons which will help surgeons decide where to perform surgery but in order to cluster the waveforms we're going to need to decide which features to input to our algorithm one possible feature it could be for example the peak amplitude of the spike or the width of the waveform but not all features are equally informative and useful we need to select the features that represent the spike wave shapes the best and get rid of the rest for our prediction to be accurate the way we're going to do that is to use a type of unsupervised learning called dimensionality reduction of which there are several techniques like brute force no we're going to use a popular one called principal component analysis or PCA PCA finds the principal components of a dataset principal components are the underlying structure in the data they are the direction where there is the most variance meaning where the data is most spread out it's useful to measure data in terms of principal components rather than on a normal XY axis imagine that we had a bunch of data points which will denote as Triforce symbols as an ode to the princess to find the direction with the most variance we can find the straight line where the data is most spread out when projected onto it a vertical straight line with the points projected onto it will look kind of like this not very spread so there's a small variance like lino principal component here a horizontal line however with lines projected onto it looks way more spread out a high variance there's no straight line we can draw that has a larger variance than a on two one thus the horizontal line is the principal component in this example to find principal components we use linear algebra one of the mathematical pillars of machine learning two concepts here iDEN vectors which have a direction and eigenvalues which are numbers that tell us how much variance there is in the data in that direction these two concepts come in pairs like in and yang and the eigenvector with the highest eigenvalue is the principal component in a three dimensional data set there are three variables imagine all the data points lie on a piece of paper sized plane in this 3d graph when we find that three eigen vectors and values two will have large I can value z' and one of the eigenvectors will have an eigen value of zero if we rearrange our axes to be along the eigen vectors rather than the original variables discarding the third one we essentially get rid of the useless direction and are able to represent it in two dimensions we can do this in a single line thanks to scikit-learn we just need to specify how many components we want when I find myself with 50 features mother and Elle comes to me predicting just the best ones let it be once we've reduced the dimensionality of our data we're ready to perform clustering the second type of unsupervised learning a popular clustering technique is called k-means first we choose a number of K random data points from our sample these represent the cluster centers and their number equals the number of clusters then we calculate the distance between all the random cluster centers and any other data point we then assign each data point to the cluster Center closest to it since we started with random data points it won't give us a great result so we repeat the process and instead of using random data points as cluster centers we calculate the actual cluster centers based on the previous random assignment this just keeps repeating and with every iteration the data points that switch clusters go down and we arrive at a global optimum we're now in the coochie gang a newer version of the hoochie game a question arises though how do we choose the number of clusters we could try running k-means multiple times with different cluster numbers when we plot the result we can analyze it to see if we chose too many clusters too few or just the right amount based on our domain knowledge we can expect to find more than two or three separable clusters from a single electrode recording in our plot seems to confirm this notion another way to decide this is to use the elbow method the way that this works is to run k-means several times and increase the number of clusters every run and during every run we calculate the average distance of each data point to its cluster Center the number of clusters increases and the average inter cluster distance decreases when we reach six clusters the average distance to the cluster Center does not change any more and this is called the elbow point it gives us a recommendation of how many clusters we should use by clustering the data we're able to sort the neuron spiked into distinct regions which correlate to different parts of the brain this is going to be supremely helpful for our client at the hospital and we just use data science to save a patient's life before we pop champagne there are three things to remember from this video unsupervised learning helps find previously unknown patterns in a data set without needing a label principal component analysis is a dimensionality reduction technique that helps find the most relevant features in a data set and k-means clustering is the most popular clustering technique grouping similar data points together for further analysis what is your next data science project let me know in the comment section and please subscribe for more programming videos for now I've got to find myself so thanks for watching

Original Description

Unsupervised learning is the most exciting subfield of machine learning! Finding structure in unstructured data automatically sounds like a dream come true, no need to have a label! In this video, I'll demonstrate 2 types of unsupervised learning techniques; k means clustering and principal component analysis. We'll use these techniques on neural data from a patient suffering from seizures to see if we can locate the part of their brain in need of surgery to save their life. You'll laugh, you'll cry, but most importantly, you'll learn. Enjoy! Code for this video: https://github.com/llSourcell/spike_sorting Please Subscribe! And Like. And comment. Thats what keeps me going. Want more education? Connect with me here: Twitter: https://twitter.com/sirajraval instagram: https://www.instagram.com/sirajraval Facebook: https://www.facebook.com/sirajology More learning resources: https://blog.algorithmia.com/introduction-to-unsupervised-learning/ http://deeplearning.stanford.edu/tutorial/ https://towardsdatascience.com/unsupervised-learning-with-python-173c51dc7f03 https://medium.com/machine-learning-for-humans/unsupervised-learning-f45587588294 Join us at the School of AI: https://theschool.ai/ Join us in the Wizards Slack channel: http://wizards.herokuapp.com/ Please support me on Patreon: https://www.patreon.com/user?u=3191693 Signup for my newsletter for exciting updates in the field of AI: https://goo.gl/FZzJ5w Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available): https://www.wagergpt.xyz
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Siraj Raval · Siraj Raval · 0 of 60

← Previous Next →
1 What is Bitcoin?
What is Bitcoin?
Siraj Raval
2 5 Ways to Use Bitcoin
5 Ways to Use Bitcoin
Siraj Raval
3 BTC Fever - Siraj [Music Video]
BTC Fever - Siraj [Music Video]
Siraj Raval
4 5 Reasons to Build Decentralized Apps
5 Reasons to Build Decentralized Apps
Siraj Raval
5 The Interplanetary File System
The Interplanetary File System
Siraj Raval
6 How to Build a Dapp in 3 min
How to Build a Dapp in 3 min
Siraj Raval
7 Life Before Smartphones
Life Before Smartphones
Siraj Raval
8 4 Ways to Use Smart Contracts
4 Ways to Use Smart Contracts
Siraj Raval
9 3 Dapps You HAVE to See
3 Dapps You HAVE to See
Siraj Raval
10 Char's Life as a BitTorrent Engineer
Char's Life as a BitTorrent Engineer
Siraj Raval
11 4 Reasons AlphaGo is a Huge Deal
4 Reasons AlphaGo is a Huge Deal
Siraj Raval
12 Build a Neural Net in 4 Minutes
Build a Neural Net in 4 Minutes
Siraj Raval
13 Sentiment Analysis in 4 Minutes
Sentiment Analysis in 4 Minutes
Siraj Raval
14 The Hackathon Life
The Hackathon Life
Siraj Raval
15 Your First ML App - Machine Learning for Hackers #1
Your First ML App - Machine Learning for Hackers #1
Siraj Raval
16 Build an AI Composer - Machine Learning for Hackers #2
Build an AI Composer - Machine Learning for Hackers #2
Siraj Raval
17 Build a Game AI - Machine Learning for Hackers #3
Build a Game AI - Machine Learning for Hackers #3
Siraj Raval
18 Build a Movie Recommender - Machine Learning for Hackers #4
Build a Movie Recommender - Machine Learning for Hackers #4
Siraj Raval
19 Build an AI Artist - Machine Learning for Hackers #5
Build an AI Artist - Machine Learning for Hackers #5
Siraj Raval
20 Build a Chatbot - ML for Hackers #6
Build a Chatbot - ML for Hackers #6
Siraj Raval
21 Build an AI Reader - Machine Learning for Hackers #7
Build an AI Reader - Machine Learning for Hackers #7
Siraj Raval
22 Build an AI Writer - Machine Learning for Hackers #8
Build an AI Writer - Machine Learning for Hackers #8
Siraj Raval
23 Build a Chatbot w/ an API - ML for Hackers #9
Build a Chatbot w/ an API - ML for Hackers #9
Siraj Raval
24 One-Shot Learning - Fresh Machine Learning #1
One-Shot Learning - Fresh Machine Learning #1
Siraj Raval
25 Generative Adversarial Nets - Fresh Machine Learning #2
Generative Adversarial Nets - Fresh Machine Learning #2
Siraj Raval
26 Tone Analysis - Fresh Machine Learning #3
Tone Analysis - Fresh Machine Learning #3
Siraj Raval
27 Generate Rap Lyrics - Fresh Machine Learning #4
Generate Rap Lyrics - Fresh Machine Learning #4
Siraj Raval
28 Build an Autoencoder in 5 Min - Fresh Machine Learning #5
Build an Autoencoder in 5 Min - Fresh Machine Learning #5
Siraj Raval
29 Build a Self Driving Car in 5 Min - Fresh Machine Learning #6
Build a Self Driving Car in 5 Min - Fresh Machine Learning #6
Siraj Raval
30 Build an Antivirus in 5 Min - Fresh Machine Learning #7
Build an Antivirus in 5 Min - Fresh Machine Learning #7
Siraj Raval
31 TensorFlow in 5 Minutes (tutorial)
TensorFlow in 5 Minutes (tutorial)
Siraj Raval
32 Build a Recurrent Neural Net in 5 Min
Build a Recurrent Neural Net in 5 Min
Siraj Raval
33 Build a Simulation in 5 Min
Build a Simulation in 5 Min
Siraj Raval
34 Build a TensorFlow Image Classifier in 5 Min
Build a TensorFlow Image Classifier in 5 Min
Siraj Raval
35 Tensorboard Explained in 5 Min
Tensorboard Explained in 5 Min
Siraj Raval
36 Generate Music in TensorFlow
Generate Music in TensorFlow
Siraj Raval
37 Build a Game Bot (LIVE)
Build a Game Bot (LIVE)
Siraj Raval
38 Deep Learning Frameworks Compared
Deep Learning Frameworks Compared
Siraj Raval
39 Introduction - Learn Python for Data Science #1
Introduction - Learn Python for Data Science #1
Siraj Raval
40 Build a Neural Network (LIVE)
Build a Neural Network (LIVE)
Siraj Raval
41 Twitter Sentiment Analysis - Learn Python for Data Science #2
Twitter Sentiment Analysis - Learn Python for Data Science #2
Siraj Raval
42 Recommendation Systems - Learn Python for Data Science #3
Recommendation Systems - Learn Python for Data Science #3
Siraj Raval
43 Predicting Stock Prices - Learn Python for Data Science #4
Predicting Stock Prices - Learn Python for Data Science #4
Siraj Raval
44 Pong Neural Network (LIVE)
Pong Neural Network (LIVE)
Siraj Raval
45 Deep Dream in TensorFlow - Learn Python for Data Science #5
Deep Dream in TensorFlow - Learn Python for Data Science #5
Siraj Raval
46 Visualizing Data with D3.js (LIVE)
Visualizing Data with D3.js (LIVE)
Siraj Raval
47 Genetic Algorithms - Learn Python for Data Science #6
Genetic Algorithms - Learn Python for Data Science #6
Siraj Raval
48 Enter Siraj [Music Video]
Enter Siraj [Music Video]
Siraj Raval
49 Build a Web Scraper (LIVE)
Build a Web Scraper (LIVE)
Siraj Raval
50 Why is P vs NP Important?
Why is P vs NP Important?
Siraj Raval
51 How to Make a Neural Network (LIVE)
How to Make a Neural Network (LIVE)
Siraj Raval
52 How to Make an Amazing Tensorflow Chatbot Easily
How to Make an Amazing Tensorflow Chatbot Easily
Siraj Raval
53 How to Make an Amazing Video Game Bot Easily
How to Make an Amazing Video Game Bot Easily
Siraj Raval
54 How to Make a Tensorflow Neural Network (LIVE)
How to Make a Tensorflow Neural Network (LIVE)
Siraj Raval
55 How to Make a Simple Tensorflow Speech Recognizer
How to Make a Simple Tensorflow Speech Recognizer
Siraj Raval
56 Joel Shor - Really Quick Questions with an Awesome Google Engineer
Joel Shor - Really Quick Questions with an Awesome Google Engineer
Siraj Raval
57 How to Make a Path Planning Algorithm Easily (LIVE)
How to Make a Path Planning Algorithm Easily (LIVE)
Siraj Raval
58 The Best Way to Prepare a Dataset Easily
The Best Way to Prepare a Dataset Easily
Siraj Raval
59 Catherine Olsson - Really Quick Questions with an OpenAI Engineer
Catherine Olsson - Really Quick Questions with an OpenAI Engineer
Siraj Raval
60 How to Make a Tic Tac Toe Neural Network Easily (LIVE)
How to Make a Tic Tac Toe Neural Network Easily (LIVE)
Siraj Raval

This video teaches unsupervised learning techniques, including PCA and K-Means Clustering, to find hidden patterns in unstructured data. It demonstrates how to use Python and scikit-learn to apply these techniques. By watching this video, viewers can learn how to eliminate the need for labeled datasets and find relevant features in unstructured data.

Key Takeaways
  1. Visualize the data using Python
  2. Extract spikes from the signal by finding data points above a predefined threshold
  3. Align the spikes at their peak amplitude
  4. Use PCA to reduce the dimensionality of the data
  5. Discard the third eigenvector with an eigenvalue of zero
  6. Use K-Means Clustering to group similar data points
💡 Unsupervised learning can be used to find hidden patterns in unstructured data, eliminating the need for labeled datasets, and 80% of the world's data is unstructured.

Related AI Lessons

Up next
Salesforce Flow New Features (Summer '26) | Open Record, URL & Show Toast Messages
AITECHONE
Watch →