Collaborative Filtering : Data Science Concepts

ritvikmath · Intermediate ·📐 ML Fundamentals ·5y ago

Key Takeaways

The video discusses collaborative filtering, a technique used in recommender systems, and explains how it works by identifying similar users and averaging their ratings to make predictions. It also covers the challenges of scalability and sparse matrices in collaborative filtering.

Full Transcript

[Music] hey everyone welcome back so today we're going to talk about a really cool really intuitive concept in data science called collaborative filtering now you might have heard of this before because it's used a lot in recommender systems so let's go right into the example of today let's say that you have started a new video streaming service called statflix where you host shows that are all about statistics for people to watch and you have a couple of users now and you run into this problem of i need to know what to recommend to my users so for example a user that's been using the site for a while has logged in and you have to know what should i recommend to them next in order to make them the happiest and that's where collaborative filtering comes in and the driving principle or the key idea behind collaborative filtering is this which is that past similar preferences can inform future preferences and to give a little bit of more context to that it means that of course i am not the only user on this statflix site there are many other users and they all have the ability to rate the different pieces of content so they'll watch some show and they'll like it or not like it and maybe they can rate it between a score of one to five five being that i really really like this show now if i have a couple users that have given more or less the same rating to the same piece of content then i can consider these users similar which means that they have similar tastes similar preferences things like that so now if i have one of those users come along and i need to know what to recommend to them next i might go visit the similar users so people who are just like them see what those people have liked in the past and recommend this person one of those pieces of content and this is a pretty intuitive pretty genius idea that i think people use in the real world even if you have never even thought about stats before which is just that i might like things that my friends like because my friends are similar to me somewhat right so that's the idea the driving principle behind collaborative filtering so let's look at a small example to get the idea and then at the end of this video i'll talk about some different considerations you might want to think about so we're going to assume that there's only three users in stat flicks so far of course there's going to be many more in a real recommender system but keeping things simple three users and we have eight shows so all that information is captured in this table here so on the rows we have user one user two and user three and on the columns we have the eight different shows that are available on stat flicks now each user has the ability to rate each show as i mentioned between a score of one to five but that doesn't mean that they're going to rate every single show sometimes people just will not rate a show or they haven't watched the show yet and therefore their rating will be blank so you see that many of these cells are filled in but many of them are also blank so the question for today is what should i recommend next for user u1 so if we look at u1 they've rated five of the eight shows which means that we're going to assume that they haven't yet watched show four five or six and i need to know which show should i recommend to them next so the next time they sign in i should show this one to them and hope that they like it now if we were really dumb about this we didn't take into account the fact that there's similarities between users we might just take an average of the ratings of these mystery shows for the other users for example we see that show 4 and 5 have been reviewed by users two and three so we might just take a simple average for example for show four we would say that user two gave it a rating of two and user three gave it a rating of five and if we average those two numbers we get three point five and we actually get the same exact number for show five and for show six we don't have any data so we just don't think about that one so if we didn't take into account this intuitive idea of similarities between users we would get the fact that both of these shows have an average rating of 3.5 across the entire site across all the users so you still wouldn't really know which one to recommend to user u1 now let's be a little bit smarter about this let's take into account the similarities between these users so i've taken this table and i've broken it down into two smaller tables so these two tables are only two rows each so it's u1 versus u2 and u1 versus u3 and i want to look at these tables and get an idea of who is u1 more similar to is u1 more similar to user 2 or is u1 more similar to user 3. so if we take the table of just u1 and u2 and the numbers you're looking at here are just the common numbers between these two users so for example we have these two fives down here we have the 4 and three down here we have the two and two down here and we have the one and one over here so if either of these users have reviewed a piece of content but the other one has not we don't include that pair we only include pairs where both of them have reviewed it now let's stare at this for a second and realize that these two preferences lists are very very similar the only real difference comes from this four and three and those two numbers are pretty close together to begin with so what i get an idea of is that user one and user two are ranking these pieces of content very similarly which is a very powerful concept because it means that that means that if user 2 likes something user 1 might like that also now let's see about user 1 and user 3. so i've done the same thing it turns out that they have five pairs of content that they share so i have five columns here and let's take a look at whether these preferences are similar this is a five and one that's about the opposite as you can get this is a one and four also not very similar four and two not too similar two and five four on one so these two users don't seem to be agreeing on very much anything so in my mind i would say that okay user one and user three are not that similar to each other so if user three likes something it's not very probable that user one will actually like that so how do i take into account these similarities mathematically so now let's say i want to know what to recommend to u1 i first need to devise some kind of mathematical metric of similarity between these two pairs of users so the most commonly used metric but not the only one is cosine similarity so cosine similarity between any two users user i and user j is given by this formula but in more simple terms it is the cosine of the angle between these two vectors so you can imagine that we have one vector for u1 and another vector for u3 and although these vectors are five dimensional so i can't draw them for you you can imagine that there's some angle between these two vectors so if this was in two dimensions you could imagine that i have two vectors like this and the closer they are together the smaller the angle is going to be between them and cosine of a small angle is closer to one so in a nutshell what the cosine similarity is doing is it's saying that the closer these two vectors are together or the smaller the angle between those two vectors is the higher similarity the more close to one i will give them and the other end of the story is that the further these vectors are apart to each other that means the angle between them is going to be bigger and bigger and therefore the cosine of that angle is going to be smaller and therefore the similarity score between these two users will be smaller so that whole story does check out and that's why people tend to use this cosine similarity so much okay so i won't actually go through the calculation that's not too important for this video but suffice to say that if i take the similarity between user one and user two so that's taking the cosine similarity between this vector and this vector i'm gonna get .99 so this is almost one almost as high as it could be now on the other hand if i take the cosine similarity between vector u1 and u3 so this vector in this vector i get 0.57 which isn't nearly as high so this does match up to our fuzzy understanding from before now what do i do with these numbers i do a very intuitive thing i say that if i want to know what would be the estimated rating that user one would give to piece of content number four before i was just saying that that would be the average but now i know a little bit better now i'm going to take a weighted average and those weights are given by the similarities between user one and the other two users so the story that's being told here s12 by the way is just shorthand for similarity between user one and user two so i'm saying that the estimated rating that user 1 would give to piece of content number 4 would be the similarity between 1 and 2 times the score that user 2 gave to piece of content number 4 which is two so basically i'm saying that i'm giving it a rating of two but the weight i'm putting on there is only as big as the similarity between user one and user two i add that to the rating that user three gives to that piece of content which is five but of course i also need to weight that by the similarity between user one and user three which is given here s13 and i divide that whole thing by s12 plus s13 just because i need to normalize if you notice these two numbers do not add up to one so i need to make sure to keep everything in the same bounds and therefore that's my denominator so when i do that i get that the estimated rating that user one would give to this piece of content for is actually 3.1 is actually lower than 3.5 does that intuitively make sense yes because now i know that i'm taking into account user 2's preferences a lot more and since user 2 really did not like this piece of content i'm shifting my 3.5 down to 3.1 if i do the same calculation for r15 so i didn't explicitly show the steps but they're pretty much the same this is answering the question of what's the estimated rating that user one would give two piece of content number five notice that user two really liked piece of content number 5 which means that i'm going to up shift my score from 3.5 and that's why you're getting 3.9 so using collaborative filtering and now i can give kind of an intuition about where these words come from so the filtering part is basically making automatic predictions about a user and the collaborative part as you might have guessed is we're making this predictions based on collaboration with all the other users in this environment so using collaborative filtering i'm able to determine that i should now recommend piece of content number five to my user one because it has a higher score 3.9 versus 3.1 and that's how collaborative filtering works in a nutshell and now to end this video i just wanted to talk about three big barriers to collaborative filtering because this was just kind of a toy example uh using this stat flicks but i want to talk about a couple of barriers you run into in the real world and something you do need to think about so the first big barrier and the one that's talked about most is sparsity so if you notice we had a couple of blank cells here but it was nothing that really prevented us from doing our job but if you think about a real recommendation system so if you think about tons of pieces of content and tons of users i think most users don't actually rank anything they're just there to watch their show and then they're done with it they don't really take the time to review it what that means is that your matrix which is going to be very big in both directions is going to be very sparse which means that it's going to have a lot of empty cells and this is a problem for collaborative filtering because remember the whole heart of collaborative filtering is that i need information about people who are similar to you but if nobody is rating anything i can't really get that information too reliably so collaborative filtering does rely on your matrix not being too sparse another issue that goes along with the fact that real life matrices are going to be much bigger is scalability so if you notice we had to do quite a few computations here we had to do this cosine similarity this weighted average here so if you have a lot of users or a lot of shows this might slow down considerably so this is something to think about when you actually write the code for collaborative filtering how do you do this in a way that is efficient and won't slow down your system too much and the last barrier that i'll talk about is gray sheep or black sheep problems so what that means is that let's say we have tons of users and let's say that we have one cluster of users around here and one cluster of users around here now gray sheep are those that don't really fit too well into either category so they're kind of on the border and we don't really know which one to assign them to so this can be an issue in collaborative filtering or recommendation in general and black sheep problems are when we have users that are not close to either cluster at all they're kind of just on an island by themselves so we're again not too sure what to recommend to these users but again this is not specific to collaborative filtering at least this last problem this is a problem of recommendation systems in general um so i think that's all i had to say i hope you learned about collaborative filtering and how intuitive and interesting it is in this video if you have any comments at all please post them below i hope you like this video please like and subscribe for more videos just like this and until next

Original Description

How do recommendation engines work?
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from ritvikmath · ritvikmath · 0 of 60

← Previous Next →
1 Math Team Update
Math Team Update
ritvikmath
2 Single Variable Calculus Volume of a Sphere - Proof 1
Single Variable Calculus Volume of a Sphere - Proof 1
ritvikmath
3 Single Variable Calculus Volume of a Sphere - Proof 2
Single Variable Calculus Volume of a Sphere - Proof 2
ritvikmath
4 Multivariable Calculus Volume of a Sphere Proof - Triple Integrals
Multivariable Calculus Volume of a Sphere Proof - Triple Integrals
ritvikmath
5 Multivariable Calculus Volume of a Sphere Proof - Double Integrals
Multivariable Calculus Volume of a Sphere Proof - Double Integrals
ritvikmath
6 The Euclidian Algorithm
The Euclidian Algorithm
ritvikmath
7 Proving the Chain Rule
Proving the Chain Rule
ritvikmath
8 Proving the Fundamental Theorem of Calculus Part 1
Proving the Fundamental Theorem of Calculus Part 1
ritvikmath
9 Proving the Fundamental Theorem of Calculus Part 2
Proving the Fundamental Theorem of Calculus Part 2
ritvikmath
10 Math Puzzle - Poison Perplexity
Math Puzzle - Poison Perplexity
ritvikmath
11 Math Puzzle - Poison Perplexity - Solution
Math Puzzle - Poison Perplexity - Solution
ritvikmath
12 Expected Value and Variance of Continuous Random Variables (Calculus)
Expected Value and Variance of Continuous Random Variables (Calculus)
ritvikmath
13 Expected Value and Variance of Discrete Random Variables (No Calculus)
Expected Value and Variance of Discrete Random Variables (No Calculus)
ritvikmath
14 Array Method
Array Method
ritvikmath
15 Complex Power Series and their Derivatives
Complex Power Series and their Derivatives
ritvikmath
16 Distributions - Intro
Distributions - Intro
ritvikmath
17 The Poisson Distribution
The Poisson Distribution
ritvikmath
18 The Bernoulli Distribution
The Bernoulli Distribution
ritvikmath
19 The Binomial Distribution
The Binomial Distribution
ritvikmath
20 The Continuous Uniform Distribution
The Continuous Uniform Distribution
ritvikmath
21 The Geometric Distribution
The Geometric Distribution
ritvikmath
22 The Triangular Distribution
The Triangular Distribution
ritvikmath
23 The Exponential Distribution
The Exponential Distribution
ritvikmath
24 The Borel Distribution + Notes on Poisson Distribution
The Borel Distribution + Notes on Poisson Distribution
ritvikmath
25 The Gamma Distribution
The Gamma Distribution
ritvikmath
26 The Normal Distribution
The Normal Distribution
ritvikmath
27 The Laplace Distribution
The Laplace Distribution
ritvikmath
28 The Chi - Squared Distribution
The Chi - Squared Distribution
ritvikmath
29 Overfitting
Overfitting
ritvikmath
30 Vector Norms
Vector Norms
ritvikmath
31 Truths Behind the Titanic : K-Nearest Neighbor
Truths Behind the Titanic : K-Nearest Neighbor
ritvikmath
32 The Mathematics of Breakups
The Mathematics of Breakups
ritvikmath
33 Sillyfish
Sillyfish
ritvikmath
34 Finding Optimal Paths - Dynamic Programming
Finding Optimal Paths - Dynamic Programming
ritvikmath
35 HowToDataScience : Scraping Twitter Data
HowToDataScience : Scraping Twitter Data
ritvikmath
36 Decision Trees
Decision Trees
ritvikmath
37 Perceptron
Perceptron
ritvikmath
38 Naive Bayes
Naive Bayes
ritvikmath
39 K-Nearest Neighbor
K-Nearest Neighbor
ritvikmath
40 Evaluating Machine Learning Models
Evaluating Machine Learning Models
ritvikmath
41 Decision Tree Pruning
Decision Tree Pruning
ritvikmath
42 K-Means Clustering
K-Means Clustering
ritvikmath
43 Gaussian Mixture Model
Gaussian Mixture Model
ritvikmath
44 Data Science - Fuzzy Record Matching
Data Science - Fuzzy Record Matching
ritvikmath
45 Time Series Talk : Autocorrelation and Partial Autocorrelation
Time Series Talk : Autocorrelation and Partial Autocorrelation
ritvikmath
46 Time Series Talk : Autoregressive Model
Time Series Talk : Autoregressive Model
ritvikmath
47 Time Series Talk : Moving Average Model
Time Series Talk : Moving Average Model
ritvikmath
48 Time Series Talk : ARMA Model
Time Series Talk : ARMA Model
ritvikmath
49 Time Series Talk : ARCH Model
Time Series Talk : ARCH Model
ritvikmath
50 Time Series Talk : White Noise
Time Series Talk : White Noise
ritvikmath
51 Time Series Talk : Stationarity
Time Series Talk : Stationarity
ritvikmath
52 Time Series Talk : ARIMA Model
Time Series Talk : ARIMA Model
ritvikmath
53 Time Series Talk : Lag Operator
Time Series Talk : Lag Operator
ritvikmath
54 Time Series Talk : What is Seasonality ?
Time Series Talk : What is Seasonality ?
ritvikmath
55 Time Series Talk : Seasonal ARIMA Model
Time Series Talk : Seasonal ARIMA Model
ritvikmath
56 So ... What Actually is a Matrix ? : Data Science Basics
So ... What Actually is a Matrix ? : Data Science Basics
ritvikmath
57 Derivative of a Matrix : Data Science Basics
Derivative of a Matrix : Data Science Basics
ritvikmath
58 Basics of PCA (Principal Component Analysis) : Data Science Concepts
Basics of PCA (Principal Component Analysis) : Data Science Concepts
ritvikmath
59 Eigenvalues & Eigenvectors : Data Science Basics
Eigenvalues & Eigenvectors : Data Science Basics
ritvikmath
60 The Covariance Matrix : Data Science Basics
The Covariance Matrix : Data Science Basics
ritvikmath

This video teaches collaborative filtering, a technique used in recommender systems, and explains how it works by identifying similar users and averaging their ratings to make predictions. It also covers the challenges of scalability and sparse matrices in collaborative filtering. By watching this video, viewers can learn how to build recommender systems and implement collaborative filtering.

Key Takeaways
  1. Identify similar users by rating similar content
  2. Average ratings of similar users to make recommendations
  3. Break down the data into smaller tables to compare user similarities
  4. Calculate cosine similarity between two users based on their ratings
  5. Use a weighted average of user ratings to make predictions about a user's preferences
  6. Weight the ratings by the similarities between the user and other users
  7. Normalize the weights to ensure they add up to 1
💡 Collaborative filtering relies on a matrix of user ratings, but real-life matrices are often sparse due to few ratings, and scalability is a challenge in collaborative filtering.

Related AI Lessons

10 Python Concepts You Must Know Before Calling Yourself Advanced
Learn 10 essential Python concepts to take your skills to the advanced level and stand out as a developer
Medium · AI
10 Python Concepts You Must Know Before Calling Yourself Advanced
Learn 10 crucial Python concepts to elevate your skills from intermediate to advanced and become a proficient developer
Medium · Data Science
10 Python Concepts You Must Know Before Calling Yourself Advanced
Learn 10 essential Python concepts to take your skills to the advanced level and stand out as a developer
Medium · Programming
10 Python Concepts You Must Know Before Calling Yourself Advanced
Learn 10 essential Python concepts to take your skills to the advanced level and separate yourself from beginner developers
Medium · Python
Up next
Learn Deep Learning by Hand (Beginner's Guide - Part 1)
Thu Vu
Watch →