Collaborative Filtering : Data Science Concepts
Key Takeaways
The video discusses collaborative filtering, a technique used in recommender systems, and explains how it works by identifying similar users and averaging their ratings to make predictions. It also covers the challenges of scalability and sparse matrices in collaborative filtering.
Full Transcript
[Music] hey everyone welcome back so today we're going to talk about a really cool really intuitive concept in data science called collaborative filtering now you might have heard of this before because it's used a lot in recommender systems so let's go right into the example of today let's say that you have started a new video streaming service called statflix where you host shows that are all about statistics for people to watch and you have a couple of users now and you run into this problem of i need to know what to recommend to my users so for example a user that's been using the site for a while has logged in and you have to know what should i recommend to them next in order to make them the happiest and that's where collaborative filtering comes in and the driving principle or the key idea behind collaborative filtering is this which is that past similar preferences can inform future preferences and to give a little bit of more context to that it means that of course i am not the only user on this statflix site there are many other users and they all have the ability to rate the different pieces of content so they'll watch some show and they'll like it or not like it and maybe they can rate it between a score of one to five five being that i really really like this show now if i have a couple users that have given more or less the same rating to the same piece of content then i can consider these users similar which means that they have similar tastes similar preferences things like that so now if i have one of those users come along and i need to know what to recommend to them next i might go visit the similar users so people who are just like them see what those people have liked in the past and recommend this person one of those pieces of content and this is a pretty intuitive pretty genius idea that i think people use in the real world even if you have never even thought about stats before which is just that i might like things that my friends like because my friends are similar to me somewhat right so that's the idea the driving principle behind collaborative filtering so let's look at a small example to get the idea and then at the end of this video i'll talk about some different considerations you might want to think about so we're going to assume that there's only three users in stat flicks so far of course there's going to be many more in a real recommender system but keeping things simple three users and we have eight shows so all that information is captured in this table here so on the rows we have user one user two and user three and on the columns we have the eight different shows that are available on stat flicks now each user has the ability to rate each show as i mentioned between a score of one to five but that doesn't mean that they're going to rate every single show sometimes people just will not rate a show or they haven't watched the show yet and therefore their rating will be blank so you see that many of these cells are filled in but many of them are also blank so the question for today is what should i recommend next for user u1 so if we look at u1 they've rated five of the eight shows which means that we're going to assume that they haven't yet watched show four five or six and i need to know which show should i recommend to them next so the next time they sign in i should show this one to them and hope that they like it now if we were really dumb about this we didn't take into account the fact that there's similarities between users we might just take an average of the ratings of these mystery shows for the other users for example we see that show 4 and 5 have been reviewed by users two and three so we might just take a simple average for example for show four we would say that user two gave it a rating of two and user three gave it a rating of five and if we average those two numbers we get three point five and we actually get the same exact number for show five and for show six we don't have any data so we just don't think about that one so if we didn't take into account this intuitive idea of similarities between users we would get the fact that both of these shows have an average rating of 3.5 across the entire site across all the users so you still wouldn't really know which one to recommend to user u1 now let's be a little bit smarter about this let's take into account the similarities between these users so i've taken this table and i've broken it down into two smaller tables so these two tables are only two rows each so it's u1 versus u2 and u1 versus u3 and i want to look at these tables and get an idea of who is u1 more similar to is u1 more similar to user 2 or is u1 more similar to user 3. so if we take the table of just u1 and u2 and the numbers you're looking at here are just the common numbers between these two users so for example we have these two fives down here we have the 4 and three down here we have the two and two down here and we have the one and one over here so if either of these users have reviewed a piece of content but the other one has not we don't include that pair we only include pairs where both of them have reviewed it now let's stare at this for a second and realize that these two preferences lists are very very similar the only real difference comes from this four and three and those two numbers are pretty close together to begin with so what i get an idea of is that user one and user two are ranking these pieces of content very similarly which is a very powerful concept because it means that that means that if user 2 likes something user 1 might like that also now let's see about user 1 and user 3. so i've done the same thing it turns out that they have five pairs of content that they share so i have five columns here and let's take a look at whether these preferences are similar this is a five and one that's about the opposite as you can get this is a one and four also not very similar four and two not too similar two and five four on one so these two users don't seem to be agreeing on very much anything so in my mind i would say that okay user one and user three are not that similar to each other so if user three likes something it's not very probable that user one will actually like that so how do i take into account these similarities mathematically so now let's say i want to know what to recommend to u1 i first need to devise some kind of mathematical metric of similarity between these two pairs of users so the most commonly used metric but not the only one is cosine similarity so cosine similarity between any two users user i and user j is given by this formula but in more simple terms it is the cosine of the angle between these two vectors so you can imagine that we have one vector for u1 and another vector for u3 and although these vectors are five dimensional so i can't draw them for you you can imagine that there's some angle between these two vectors so if this was in two dimensions you could imagine that i have two vectors like this and the closer they are together the smaller the angle is going to be between them and cosine of a small angle is closer to one so in a nutshell what the cosine similarity is doing is it's saying that the closer these two vectors are together or the smaller the angle between those two vectors is the higher similarity the more close to one i will give them and the other end of the story is that the further these vectors are apart to each other that means the angle between them is going to be bigger and bigger and therefore the cosine of that angle is going to be smaller and therefore the similarity score between these two users will be smaller so that whole story does check out and that's why people tend to use this cosine similarity so much okay so i won't actually go through the calculation that's not too important for this video but suffice to say that if i take the similarity between user one and user two so that's taking the cosine similarity between this vector and this vector i'm gonna get .99 so this is almost one almost as high as it could be now on the other hand if i take the cosine similarity between vector u1 and u3 so this vector in this vector i get 0.57 which isn't nearly as high so this does match up to our fuzzy understanding from before now what do i do with these numbers i do a very intuitive thing i say that if i want to know what would be the estimated rating that user one would give to piece of content number four before i was just saying that that would be the average but now i know a little bit better now i'm going to take a weighted average and those weights are given by the similarities between user one and the other two users so the story that's being told here s12 by the way is just shorthand for similarity between user one and user two so i'm saying that the estimated rating that user 1 would give to piece of content number 4 would be the similarity between 1 and 2 times the score that user 2 gave to piece of content number 4 which is two so basically i'm saying that i'm giving it a rating of two but the weight i'm putting on there is only as big as the similarity between user one and user two i add that to the rating that user three gives to that piece of content which is five but of course i also need to weight that by the similarity between user one and user three which is given here s13 and i divide that whole thing by s12 plus s13 just because i need to normalize if you notice these two numbers do not add up to one so i need to make sure to keep everything in the same bounds and therefore that's my denominator so when i do that i get that the estimated rating that user one would give to this piece of content for is actually 3.1 is actually lower than 3.5 does that intuitively make sense yes because now i know that i'm taking into account user 2's preferences a lot more and since user 2 really did not like this piece of content i'm shifting my 3.5 down to 3.1 if i do the same calculation for r15 so i didn't explicitly show the steps but they're pretty much the same this is answering the question of what's the estimated rating that user one would give two piece of content number five notice that user two really liked piece of content number 5 which means that i'm going to up shift my score from 3.5 and that's why you're getting 3.9 so using collaborative filtering and now i can give kind of an intuition about where these words come from so the filtering part is basically making automatic predictions about a user and the collaborative part as you might have guessed is we're making this predictions based on collaboration with all the other users in this environment so using collaborative filtering i'm able to determine that i should now recommend piece of content number five to my user one because it has a higher score 3.9 versus 3.1 and that's how collaborative filtering works in a nutshell and now to end this video i just wanted to talk about three big barriers to collaborative filtering because this was just kind of a toy example uh using this stat flicks but i want to talk about a couple of barriers you run into in the real world and something you do need to think about so the first big barrier and the one that's talked about most is sparsity so if you notice we had a couple of blank cells here but it was nothing that really prevented us from doing our job but if you think about a real recommendation system so if you think about tons of pieces of content and tons of users i think most users don't actually rank anything they're just there to watch their show and then they're done with it they don't really take the time to review it what that means is that your matrix which is going to be very big in both directions is going to be very sparse which means that it's going to have a lot of empty cells and this is a problem for collaborative filtering because remember the whole heart of collaborative filtering is that i need information about people who are similar to you but if nobody is rating anything i can't really get that information too reliably so collaborative filtering does rely on your matrix not being too sparse another issue that goes along with the fact that real life matrices are going to be much bigger is scalability so if you notice we had to do quite a few computations here we had to do this cosine similarity this weighted average here so if you have a lot of users or a lot of shows this might slow down considerably so this is something to think about when you actually write the code for collaborative filtering how do you do this in a way that is efficient and won't slow down your system too much and the last barrier that i'll talk about is gray sheep or black sheep problems so what that means is that let's say we have tons of users and let's say that we have one cluster of users around here and one cluster of users around here now gray sheep are those that don't really fit too well into either category so they're kind of on the border and we don't really know which one to assign them to so this can be an issue in collaborative filtering or recommendation in general and black sheep problems are when we have users that are not close to either cluster at all they're kind of just on an island by themselves so we're again not too sure what to recommend to these users but again this is not specific to collaborative filtering at least this last problem this is a problem of recommendation systems in general um so i think that's all i had to say i hope you learned about collaborative filtering and how intuitive and interesting it is in this video if you have any comments at all please post them below i hope you like this video please like and subscribe for more videos just like this and until next
Original Description
How do recommendation engines work?
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from ritvikmath · ritvikmath · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Math Team Update
ritvikmath
Single Variable Calculus Volume of a Sphere - Proof 1
ritvikmath
Single Variable Calculus Volume of a Sphere - Proof 2
ritvikmath
Multivariable Calculus Volume of a Sphere Proof - Triple Integrals
ritvikmath
Multivariable Calculus Volume of a Sphere Proof - Double Integrals
ritvikmath
The Euclidian Algorithm
ritvikmath
Proving the Chain Rule
ritvikmath
Proving the Fundamental Theorem of Calculus Part 1
ritvikmath
Proving the Fundamental Theorem of Calculus Part 2
ritvikmath
Math Puzzle - Poison Perplexity
ritvikmath
Math Puzzle - Poison Perplexity - Solution
ritvikmath
Expected Value and Variance of Continuous Random Variables (Calculus)
ritvikmath
Expected Value and Variance of Discrete Random Variables (No Calculus)
ritvikmath
Array Method
ritvikmath
Complex Power Series and their Derivatives
ritvikmath
Distributions - Intro
ritvikmath
The Poisson Distribution
ritvikmath
The Bernoulli Distribution
ritvikmath
The Binomial Distribution
ritvikmath
The Continuous Uniform Distribution
ritvikmath
The Geometric Distribution
ritvikmath
The Triangular Distribution
ritvikmath
The Exponential Distribution
ritvikmath
The Borel Distribution + Notes on Poisson Distribution
ritvikmath
The Gamma Distribution
ritvikmath
The Normal Distribution
ritvikmath
The Laplace Distribution
ritvikmath
The Chi - Squared Distribution
ritvikmath
Overfitting
ritvikmath
Vector Norms
ritvikmath
Truths Behind the Titanic : K-Nearest Neighbor
ritvikmath
The Mathematics of Breakups
ritvikmath
Sillyfish
ritvikmath
Finding Optimal Paths - Dynamic Programming
ritvikmath
HowToDataScience : Scraping Twitter Data
ritvikmath
Decision Trees
ritvikmath
Perceptron
ritvikmath
Naive Bayes
ritvikmath
K-Nearest Neighbor
ritvikmath
Evaluating Machine Learning Models
ritvikmath
Decision Tree Pruning
ritvikmath
K-Means Clustering
ritvikmath
Gaussian Mixture Model
ritvikmath
Data Science - Fuzzy Record Matching
ritvikmath
Time Series Talk : Autocorrelation and Partial Autocorrelation
ritvikmath
Time Series Talk : Autoregressive Model
ritvikmath
Time Series Talk : Moving Average Model
ritvikmath
Time Series Talk : ARMA Model
ritvikmath
Time Series Talk : ARCH Model
ritvikmath
Time Series Talk : White Noise
ritvikmath
Time Series Talk : Stationarity
ritvikmath
Time Series Talk : ARIMA Model
ritvikmath
Time Series Talk : Lag Operator
ritvikmath
Time Series Talk : What is Seasonality ?
ritvikmath
Time Series Talk : Seasonal ARIMA Model
ritvikmath
So ... What Actually is a Matrix ? : Data Science Basics
ritvikmath
Derivative of a Matrix : Data Science Basics
ritvikmath
Basics of PCA (Principal Component Analysis) : Data Science Concepts
ritvikmath
Eigenvalues & Eigenvectors : Data Science Basics
ritvikmath
The Covariance Matrix : Data Science Basics
ritvikmath
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
10 Python Concepts You Must Know Before Calling Yourself Advanced
Medium · AI
10 Python Concepts You Must Know Before Calling Yourself Advanced
Medium · Data Science
10 Python Concepts You Must Know Before Calling Yourself Advanced
Medium · Programming
10 Python Concepts You Must Know Before Calling Yourself Advanced
Medium · Python
🎓
Tutor Explanation
DeepCamp AI