Unsupervised Learning

Siraj Raval · Beginner ·🛠️ AI Tools & Apps ·7y ago

Skills: Unsupervised Learning90%LLM Foundations80%ML Maths Basics60%

Key Takeaways

This video by Siraj Raval demonstrates unsupervised learning techniques, specifically Principal Component Analysis (PCA) and K-Means Clustering, using Python and scikit-learn to find hidden patterns in unstructured data.

Full Transcript

guys I think I found it classified tree of life hello world it's Suraj and the most exciting class of machine learning techniques is called unsupervised learning teaching machines to learn for themselves without having to be explicitly told if everything they do is right or wrong is the key to true artificial intelligence and perhaps the most important research goal of 2019 I mean how else are we going to get to fully automated luxury day space communism in this episode I'm going to give you a broad overview of this area as well as teach you two of the most popular unsupervised learning techniques principal component analysis and k-means clustering in order to save someone's life a patient at a hospital has been suffering from several epilepsy related seizures luckily we have a data set of their neural activity recorded by electrodes that were inserted into their brain the lead surgeon asked us to use unsupervised learning techniques on this neural data to find out what part of their brain is causing the seizures so they can perform surgery on it will we save the patient's life we'll find out at the end of this video and subscribe if you want to keep learning about AI technology for free we can divide machine learning into two types supervised and unsupervised there's also reinforcement learning but that only applies in a real-time environment it's not a static data spreadsheet there's also quantum machine look can you please keep it simple for once so supervised learning is synonymous with pattern matching it's done using the ground truth meaning we have prior knowledge of what the output values for our input data should be you know that hot dog not hot dog classifier trope from the popular show Silicon Valley that's supervised learning my life is literally that show so I don't watch it the goal is to approximate the relationship between input and output data most machine learning across every industry is done this way it's easy it's straightforward and it tends to perform very well if given enough samples but clean perfectly labeled datasets aren't always easy to find in fact 80% of the world's data is unstructured the goal of unsupervised learning is to automatically find structure in a data set this can itself be the goal discovering hidden patterns in data or a means to an end to learn what the most relevant features are we can further subdivide unsupervised learning into different types of techniques clustering finds data points similar to each other and groups them together if we had any kind of population data whether we were a government organization or a start-up with a product like diet water yes that's real basically anyone trying to reach a certain set of people we want to segment that population into smaller clusters with similar demographics and purchasing habits so that we could target them most effectively spending our marketing budget anomaly detection finds the outliers in a collection of data points banks uses to find fraudulent transactions Association finds correlated features between data points then lets us infer other features of a given data point Airbnb uses to recommend other listings you probably like and dimensionality reduction reduces the number of features in a data set which makes it easier to visualize and interpret Yamla director of AI research at Facebook puts it best with his quote if intelligence was a cake unsupervised learning would be the cake supervised learning would be the icing on the cake and reinforcement learning would be the cherry on the cake we now know how to make the icing and the cherry but we don't know how to make the cake talk about strange but weirdly effective metaphors here's to you young so let's take a look at our data to decide what to do with it this is a 30-minute long recording of neural data from an epilepsy patient a set of electrodes were inserted into the brain of this patient to record the activity of neurons in real-time it picked up electrical spikes of neurons and we can see several features here that relate to the recording devices measurements like the channel number frequency and the number of samples let's first visualize this data using Digital alchemy aka Python we want to extract spikes from the signal and to do that we'll find data points in the signal that are above some predefined threshold and align them at their peak amplitude we can do this with just 100 random spikes and see that there are at least two types of waveforms in the data one group of spikes with a sharp high amplitude peak and a second group with a broader initial peak these bites were likely generated by more than one neuron if we can find a way to group these waveforms into different clusters it will help us figure out which spike corresponds to which neurons which will help surgeons decide where to perform surgery but in order to cluster the waveforms we're going to need to decide which features to input to our algorithm one possible feature it could be for example the peak amplitude of the spike or the width of the waveform but not all features are equally informative and useful we need to select the features that represent the spike wave shapes the best and get rid of the rest for our prediction to be accurate the way we're going to do that is to use a type of unsupervised learning called dimensionality reduction of which there are several techniques like brute force no we're going to use a popular one called principal component analysis or PCA PCA finds the principal components of a dataset principal components are the underlying structure in the data they are the direction where there is the most variance meaning where the data is most spread out it's useful to measure data in terms of principal components rather than on a normal XY axis imagine that we had a bunch of data points which will denote as Triforce symbols as an ode to the princess to find the direction with the most variance we can find the straight line where the data is most spread out when projected onto it a vertical straight line with the points projected onto it will look kind of like this not very spread so there's a small variance like lino principal component here a horizontal line however with lines projected onto it looks way more spread out a high variance there's no straight line we can draw that has a larger variance than a on two one thus the horizontal line is the principal component in this example to find principal components we use linear algebra one of the mathematical pillars of machine learning two concepts here iDEN vectors which have a direction and eigenvalues which are numbers that tell us how much variance there is in the data in that direction these two concepts come in pairs like in and yang and the eigenvector with the highest eigenvalue is the principal component in a three dimensional data set there are three variables imagine all the data points lie on a piece of paper sized plane in this 3d graph when we find that three eigen vectors and values two will have large I can value z' and one of the eigenvectors will have an eigen value of zero if we rearrange our axes to be along the eigen vectors rather than the original variables discarding the third one we essentially get rid of the useless direction and are able to represent it in two dimensions we can do this in a single line thanks to scikit-learn we just need to specify how many components we want when I find myself with 50 features mother and Elle comes to me predicting just the best ones let it be once we've reduced the dimensionality of our data we're ready to perform clustering the second type of unsupervised learning a popular clustering technique is called k-means first we choose a number of K random data points from our sample these represent the cluster centers and their number equals the number of clusters then we calculate the distance between all the random cluster centers and any other data point we then assign each data point to the cluster Center closest to it since we started with random data points it won't give us a great result so we repeat the process and instead of using random data points as cluster centers we calculate the actual cluster centers based on the previous random assignment this just keeps repeating and with every iteration the data points that switch clusters go down and we arrive at a global optimum we're now in the coochie gang a newer version of the hoochie game a question arises though how do we choose the number of clusters we could try running k-means multiple times with different cluster numbers when we plot the result we can analyze it to see if we chose too many clusters too few or just the right amount based on our domain knowledge we can expect to find more than two or three separable clusters from a single electrode recording in our plot seems to confirm this notion another way to decide this is to use the elbow method the way that this works is to run k-means several times and increase the number of clusters every run and during every run we calculate the average distance of each data point to its cluster Center the number of clusters increases and the average inter cluster distance decreases when we reach six clusters the average distance to the cluster Center does not change any more and this is called the elbow point it gives us a recommendation of how many clusters we should use by clustering the data we're able to sort the neuron spiked into distinct regions which correlate to different parts of the brain this is going to be supremely helpful for our client at the hospital and we just use data science to save a patient's life before we pop champagne there are three things to remember from this video unsupervised learning helps find previously unknown patterns in a data set without needing a label principal component analysis is a dimensionality reduction technique that helps find the most relevant features in a data set and k-means clustering is the most popular clustering technique grouping similar data points together for further analysis what is your next data science project let me know in the comment section and please subscribe for more programming videos for now I've got to find myself so thanks for watching

Original Description

Unsupervised learning is the most exciting subfield of machine learning! Finding structure in unstructured data automatically sounds like a dream come true, no need to have a label! In this video, I'll demonstrate 2 types of unsupervised learning techniques; k means clustering and principal component analysis. We'll use these techniques on neural data from a patient suffering from seizures to see if we can locate the part of their brain in need of surgery to save their life. You'll laugh, you'll cry, but most importantly, you'll learn. Enjoy! Code for this video: https://github.com/llSourcell/spike_sorting Please Subscribe! And Like. And comment. Thats what keeps me going. Want more education? Connect with me here: Twitter: https://twitter.com/sirajraval instagram: https://www.instagram.com/sirajraval Facebook: https://www.facebook.com/sirajology More learning resources: https://blog.algorithmia.com/introduction-to-unsupervised-learning/ http://deeplearning.stanford.edu/tutorial/ https://towardsdatascience.com/unsupervised-learning-with-python-173c51dc7f03 https://medium.com/machine-learning-for-humans/unsupervised-learning-f45587588294 Join us at the School of AI: https://theschool.ai/ Join us in the Wizards Slack channel: http://wizards.herokuapp.com/ Please support me on Patreon: https://www.patreon.com/user?u=3191693 Signup for my newsletter for exciting updates in the field of AI: https://goo.gl/FZzJ5w Hit the Join button above to sign up to become a member of my channel for access to exclusive content! Join my AI community: http://chatgptschool.io/ Sign up for my AI Sports betting Bot, WagerGPT! (500 spots available): https://www.wagergpt.xyz

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Siraj Raval · Siraj Raval · 0 of 60

← Previous Next →

What is Bitcoin?

What is Bitcoin?

5 Ways to Use Bitcoin

5 Ways to Use Bitcoin

BTC Fever - Siraj [Music Video]

BTC Fever - Siraj [Music Video]

5 Reasons to Build Decentralized Apps

5 Reasons to Build Decentralized Apps

The Interplanetary File System

The Interplanetary File System

How to Build a Dapp in 3 min

How to Build a Dapp in 3 min

Life Before Smartphones

Life Before Smartphones

4 Ways to Use Smart Contracts

4 Ways to Use Smart Contracts

3 Dapps You HAVE to See

3 Dapps You HAVE to See

Char's Life as a BitTorrent Engineer

Char's Life as a BitTorrent Engineer

4 Reasons AlphaGo is a Huge Deal

4 Reasons AlphaGo is a Huge Deal

Build a Neural Net in 4 Minutes

Build a Neural Net in 4 Minutes

Sentiment Analysis in 4 Minutes

Sentiment Analysis in 4 Minutes

The Hackathon Life

The Hackathon Life

Your First ML App - Machine Learning for Hackers #1

Your First ML App - Machine Learning for Hackers #1

Build an AI Composer - Machine Learning for Hackers #2

Build an AI Composer - Machine Learning for Hackers #2

Build a Game AI - Machine Learning for Hackers #3

Build a Game AI - Machine Learning for Hackers #3

Build a Movie Recommender - Machine Learning for Hackers #4

Build a Movie Recommender - Machine Learning for Hackers #4

Build an AI Artist - Machine Learning for Hackers #5

Build an AI Artist - Machine Learning for Hackers #5

Build a Chatbot - ML for Hackers #6

Build a Chatbot - ML for Hackers #6

Build an AI Reader - Machine Learning for Hackers #7

Build an AI Reader - Machine Learning for Hackers #7

Build an AI Writer - Machine Learning for Hackers #8

Build an AI Writer - Machine Learning for Hackers #8

Build a Chatbot w/ an API - ML for Hackers #9

Build a Chatbot w/ an API - ML for Hackers #9

One-Shot Learning - Fresh Machine Learning #1

One-Shot Learning - Fresh Machine Learning #1

Generative Adversarial Nets - Fresh Machine Learning #2

Generative Adversarial Nets - Fresh Machine Learning #2

Tone Analysis - Fresh Machine Learning #3

Tone Analysis - Fresh Machine Learning #3

Generate Rap Lyrics - Fresh Machine Learning #4

Generate Rap Lyrics - Fresh Machine Learning #4

Build an Autoencoder in 5 Min - Fresh Machine Learning #5

Build an Autoencoder in 5 Min - Fresh Machine Learning #5

Build a Self Driving Car in 5 Min - Fresh Machine Learning #6

Build a Self Driving Car in 5 Min - Fresh Machine Learning #6

Build an Antivirus in 5 Min - Fresh Machine Learning #7

Build an Antivirus in 5 Min - Fresh Machine Learning #7

TensorFlow in 5 Minutes (tutorial)

TensorFlow in 5 Minutes (tutorial)

Build a Recurrent Neural Net in 5 Min

Build a Recurrent Neural Net in 5 Min

Build a Simulation in 5 Min

Build a Simulation in 5 Min

Build a TensorFlow Image Classifier in 5 Min

Build a TensorFlow Image Classifier in 5 Min

Tensorboard Explained in 5 Min

Tensorboard Explained in 5 Min

Generate Music in TensorFlow

Generate Music in TensorFlow

Build a Game Bot (LIVE)

Build a Game Bot (LIVE)

Deep Learning Frameworks Compared

Deep Learning Frameworks Compared

Introduction - Learn Python for Data Science #1

Introduction - Learn Python for Data Science #1

Build a Neural Network (LIVE)

Build a Neural Network (LIVE)

Twitter Sentiment Analysis - Learn Python for Data Science #2

Twitter Sentiment Analysis - Learn Python for Data Science #2

Recommendation Systems - Learn Python for Data Science #3

Recommendation Systems - Learn Python for Data Science #3

Predicting Stock Prices - Learn Python for Data Science #4

Predicting Stock Prices - Learn Python for Data Science #4

Pong Neural Network (LIVE)

Pong Neural Network (LIVE)

Deep Dream in TensorFlow - Learn Python for Data Science #5

Deep Dream in TensorFlow - Learn Python for Data Science #5

Visualizing Data with D3.js (LIVE)

Visualizing Data with D3.js (LIVE)

Genetic Algorithms - Learn Python for Data Science #6

Genetic Algorithms - Learn Python for Data Science #6

Enter Siraj [Music Video]

Enter Siraj [Music Video]

Build a Web Scraper (LIVE)

Build a Web Scraper (LIVE)

Why is P vs NP Important?

Why is P vs NP Important?

How to Make a Neural Network (LIVE)

How to Make a Neural Network (LIVE)

How to Make an Amazing Tensorflow Chatbot Easily

How to Make an Amazing Tensorflow Chatbot Easily

How to Make an Amazing Video Game Bot Easily

How to Make an Amazing Video Game Bot Easily

How to Make a Tensorflow Neural Network (LIVE)

How to Make a Tensorflow Neural Network (LIVE)

How to Make a Simple Tensorflow Speech Recognizer

How to Make a Simple Tensorflow Speech Recognizer

Joel Shor - Really Quick Questions with an Awesome Google Engineer

Joel Shor - Really Quick Questions with an Awesome Google Engineer

How to Make a Path Planning Algorithm Easily (LIVE)

How to Make a Path Planning Algorithm Easily (LIVE)

The Best Way to Prepare a Dataset Easily

The Best Way to Prepare a Dataset Easily

Catherine Olsson - Really Quick Questions with an OpenAI Engineer

Catherine Olsson - Really Quick Questions with an OpenAI Engineer

How to Make a Tic Tac Toe Neural Network Easily (LIVE)

How to Make a Tic Tac Toe Neural Network Easily (LIVE)

This video teaches unsupervised learning techniques, including PCA and K-Means Clustering, to find hidden patterns in unstructured data. It demonstrates how to use Python and scikit-learn to apply these techniques. By watching this video, viewers can learn how to eliminate the need for labeled datasets and find relevant features in unstructured data.

Key Takeaways

Visualize the data using Python
Extract spikes from the signal by finding data points above a predefined threshold
Align the spikes at their peak amplitude
Use PCA to reduce the dimensionality of the data
Discard the third eigenvector with an eigenvalue of zero
Use K-Means Clustering to group similar data points

💡 Unsupervised learning can be used to find hidden patterns in unstructured data, eliminating the need for labeled datasets, and 80% of the world's data is unstructured.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Unsupervised Learning

View skill →

How to implement K-Means from scratch with Python

How to implement K-Means from scratch with Python

K-Means Clustering - The Math of Intelligence (Week 3)

K-Means Clustering - The Math of Intelligence (Week 3)

Mean Shift with Titanic Dataset - Practical Machine Learning Tutorial with Python p.40

Mean Shift with Titanic Dataset - Practical Machine Learning Tutorial with Python p.40

Self-/Unsupervised GNN Training

Self-/Unsupervised GNN Training

Statistical Learning: 12.R.3 Hierarchical Clustering

Statistical Learning: 12.R.3 Hierarchical Clustering

Stanford Online

Clustering with DBSCAN, Clearly Explained!!!

Clustering with DBSCAN, Clearly Explained!!!

StatQuest with Josh Starmer

Related AI Lessons

This ChatGPT Prompt Replaced 3 Hours of PowerPoint Work

Learn to generate pitch-ready presentation decks in 5 minutes using ChatGPT, replacing hours of manual work

This ChatGPT Prompt Replaced 3 Hours of PowerPoint Work

Learn to generate pitch-ready presentation decks in 5 minutes using ChatGPT, replacing hours of manual work

Medium · ChatGPT

How AI Assist Turns a Rough Draft into a Polished Document in Minutes

Learn how AI Assist can transform a rough draft into a polished document in minutes, streamlining your writing process

Dev.to · paperquire

13 ways to make money with AI in 2026, ranked by how fast you will see your first dollar.

Learn 13 ways to monetize AI in 2026, ranked by time-to-earnings, to start generating income quickly

Salesforce Flow New Features (Summer '26) | Open Record, URL & Show Toast Messages