Does Model Performance Even Matter?

ritvikmath · Advanced ·📐 ML Fundamentals ·4y ago

Key Takeaways

The video discusses the importance of model performance in machine learning, highlighting the trade-offs between accuracy, inference time, and interpretability, using examples with k-nearest neighbor, decision tree, linear regression, and neural network models.

Full Transcript

[Music] hey everyone welcome back today i want to talk about what makes a model good in machine learning now if this seems like a weird question then that's part of the reason why i'm making this video in the first place when you think about the way that machine learning and data science are typically taught in universities and in courses it really does make it seem like there's only one real way to tell whether one model is better than another and that is using some kind of performance metric so the exact performance metric you use will depend on if you're solving a classification problem in which case you might use precision recall accuracy if you're solving a regression problem maybe it's mean squared error mean absolute percent error they can take many different forms they tell you how good is this model doing on some kind of tester validation set for the problem at hand and at first glance it seems like of course this is the only thing you should use but when you start a real data science job in industry when you get some real hands-on experience with data science you start quickly realizing that there's many many other ways to tell whether a model is strong or not strong besides just this performance metric and in fact there's many reasons why you might choose actually a weaker performing model if it does well in many of these other metrics now to start diving into the specifics a little bit instead of going through all of these metrics one at a time talking about them let's have a little more fun with it let's just do a couple of case studies so what i'll do is present two different models that you're considering and tell you some information about them and ask the very simple question which one is better which one do you use so to start with let's say you're trying to solve some kind of binary classification problem and you're considering two models the first is a k-nearest neighbor and the other is a decision tree now the first piece of information i'll tell you about each model is that the k nearest neighbor gives you a 90 accuracy on the validation data set so let's pretend that accuracy is the performance metric that you've chosen to use the decision tree gives you five percent lower accuracy eighty-five percent so looking at this row alone just these pieces of information seems like hannier's neighbor is the better way to go here better accuracy now i'm going to tell you one more piece of information which is going to make the story more interesting the k-nearest neighbor has a one-minute inference time what is inference time well what is inference inference is the part of the machine learning pipeline where you have a trained model and you're going to use this train model to make some kind of prediction in this case some kind of binary classification about an unseen sample so you can call this prediction it's called inference more typically in statistics so we'll be using that term instead and so what it means to have a one minute inference time is that it takes one minute for this k nearest neighbor model to take a new sample and tell me the result declassified label for the sample why would it take this law well k nearest neighbor is a very interesting model in that it's not really a model at all the model is your data and the way you make a prediction about a new sample is you compare it against every single one of your training samples find the k closest neighbors and then do something like take a majority of the labels of those to get the predicted label of your new sample so it could take a very long time since you basically have to in the purest k nearest neighbor form search every single training data point now the decision tree on the other hand has a one second inference time much much faster and it makes sense because a decision tree even if you only know a little bit about it essentially is just a flow chart using the features in your data so that you're not actually needing to store all the data at all you're not consulting the training data itself you're just consulting this flow chart which behind the scenes is just a series of if statements so running this new samples features through a series of if statements is going to be very fast compared to what k-nearest neighbor needs to do so looking at all this information together i ask you the question which model do you use seems like a bit of a tough question now because the k nearest neighbor gives you better accuracy but the decision tree gives you a lot better inference time and the reason it's a tough question and the reason that there's not really an answer is because we need to ask a deeper question which is what is this model actually for without knowing the answer to this question there's really no answer to the question of which is the better model it depends what this model is going to be used for and to hit that point home let me give you two different cases so let's say we live in this world case one where this model that we're considering is used to predict the high school dropout so probability that a high school student will drop out before completing and getting their degree now if this is the case in my opinion you should go with the k nearest neighbor model now i do think this this is definitely my opinion but i also think there is a lot of merit and reasons why i'm going with the caneer stamper in this case because when you think about case one when you're trying to predict the probability of students gonna drop out the units that you're operating on are high school students and getting five percent better accuracy means a lot in the context of decisions impacting real human beings so for example you're probably going to take some action based on the results of this model maybe students who are expected to drop out you'll give them more resources students who are not expected to drop out maybe get less resources so when it becomes a question about outcomes of a human being i think accuracy really is a really really strong metric to go off of or more more accurately the performance of your model is very important in that case now yes it's going to take a lot longer for each student but again you probably don't have that many students you're not dealing with the order of millions or tens of millions of data points it's typically maybe thousands maximum so i do think the trade-off is worth it to go with the k nearest neighbor in that case let's look at case two now let's pretend case two what if instead this was a model which was used to predict the probability that someone will click on a result for some search engine that you're creating so we had some videos in the past where we were talking about building a hypothetical search engine for animal lovers where they type in some kind of keyword and you serve up some kind of results they might like so although it is very important to give people a good search experience it's not as dire of a consequence as this predicting high school dropout it's not going to affect someone's future life outcome potentially and so i do think it makes more sense in this case to go with the decision tree yes it has five percent lower accuracy so on average the person's not going to like this result as much but the key insight here and the reason that really decision tree is the only option that makes sense here is because of this key one second inference time in fact with modern search engines one second is still kind of long but it's a lot lot better than a one minute inference time can you imagine typing something into a search engine and the model takes one minute before you can get any results back nobody in this modern time is going to deal with that they're just going to leave and go to some other search engine so you want something that performs really really fast even if you're going to take a little bit of an accuracy hit to get there so hopefully this starts opening the door for this more nuanced view of looking at models it's not all about the accuracy it's not all about the performance metric the performance metric does matter especially in cases like case one where you're making some decision about a human being's outcome in life but there's many other things that could matter based on what your case actually is so let's look at a regression problem to continue developing this story so let's say you are considering two models again for regression this time let's say the first one is a simple linear regression model and the other is a complicated neural network model so you don't need to know about how either of these work all you need to know is that neural networks are a fairly sophisticated model and linear regression is a very simple model so again i have some information about each model let's go through it so the first one is that the linear aggression takes one minute to train so this is a different metric than the ones we've been looking at before this is training time which is that given some data set of a fixed size it's going to take one minute to train this linear regression now using that same data set it's going to take 12 hours to train this neural network and this is not really an exaggeration neural networks are known to require a lot of computing power and a lot of time in order to properly train just because of all of the weights that need to get calibrated so it takes 12 hours to train this takes one minute to train this now let's look at another metric that we haven't looked at yet which is complexity interpretability i said in the beginning the linear regression is a lot easier to understand it's at its purest form just y equals mx plus b you can have more coefficients of course if it's a higher dimensional but that's the purest form of it a neural network you can't understand it and you should but if you think about people who don't really dabble in data science too much it's going to take a lot of your time it's going to take a lot of resources to fully explain a neural network to them so this neural network is very hard to explain and interpret exactly what's going on under the hood and how it works whereas the linear regression is very easy to explain and interpret and finally let's look at the performance metric for these so this is a regression problem let's say we're going to be using mape or mean absolute percent error which is just the absolute percent error average from the truth let's say the linear regression gives you a 10 percent mape and the neural network gives you a 5 map so the neural network is performing better again it's a more complex model so we might expect this than the linear regression but the linear regression has these other two things going forward it's very quick to train and it's very easy to explain to somebody else now again let me ask you the question which one is better and of course we need to ask the same follow-up question to have any chance of answering the first question which is what is this model for what is it going to be used for so let's say this model is being used to predict sales but let's look at two different cases which is going to drastically change the decision that we make let's say the first case is that it's going to be used to predict tomorrow's sales for let's say you own a small business or something let's say this model is going to be used to predict sales in the coming day tomorrow if you think about that think about that for a second that means every day you need to train a new model because you need to be using all the data up until now to predict sales tomorrow and then tomorrow arrives use all the data up until then to predict the next day so this model is getting trained on a daily basis now it doesn't make sense that we would need 12 hours to train this model because that doesn't give us a lot of breathing room for example imagine that you start training the neural network and maybe 10 hours goes by and your computer crashes some kind of error arises now you have to start all over that's not a lot of time left you have basically you're at 22 hours by the time this hopefully succeeds if it doesn't fail again that doesn't leave a lot of breathing room for you to have a model that's going to be doing something effective at the end of the day and for that reason alone the linear regression is probably the good bet here because it only requires one minute to train even if it crashes you have many many attempts to try this again without needing to worry so much about not having any model at the end of the day now instead what if this was used to predict not tomorrow's sales but to predict the sales on let's say this day of the week next week so today's monday we're going to be using this model on this monday to predict sales next monday so that's a whole week from now now the neural network starts looking like a lot better of a choice because yes it takes 12 hours to train this but we have a whole week so we can accept a certain number of crashes without needing to worry that we're not going to have a model at all so in this case the 12-hour training time is not really getting in our way the 5 mean absolute percent error is pretty attractive now especially because when you think about time series models it is extremely hard to predict things that are happening further and further in the future and so predicting something a week from now with five percent mean absolute percent error is really really impressive and so we'd probably choose the neural network in this case now we haven't really taken the middle row about whether it's easy or hard to explain into this case study but you can imagine cases where that could be important too for example i'm just kind of making this up on the fly but let's say you work at some kind of a big company and there's many different stakeholders in this project so you might be the data scientist but there's many other people on the team who are not as data savvy or educated in these data science techniques as you but you still need to get their buy-in you need to explain these models to them to have any chance of getting this into production because they need to trust what's going on if that's the case then you might make the call to use a linear regression especially if your performance metric is not critical as it was in case two here where it would be nice to have a good performance metric but we'll take a little bit of a worse metric if there's other things that are better if that's the case then you might choose a linear regression because you can go to them and say oh here's the formula here's literally the formula written out in one line for how this works you can see that this coefficient is higher than this coefficient therefore this variable has more of an impact than that variable it's pretty easy to explain to somebody versus a neural network you're not going to be the easiest thing to draw this on a piece of paper explain exactly what's going on because chances are even you're not 100 sure why this thing is doing what it's doing now i hope going through these two case studies was a little bit enlightening just in terms of seeing that performance metrics are not the only thing that matters when you're training a machine learning model there's many many other things that matter so to round out this video let's go through uh six factors that i can think of there probably are others and i would love to hear them in the comments below but let's just go through these six big ones that i feel are pretty important in deciding whether to use one model or another and just briefly talk about each one so the first is the performance metric as much as i've been saying there's other things that are important the performance metric is of course important if you had like a terrible performance metric it doesn't really matter how good the other things are if this thing's not going to be able to do its job so you need to have your performance be generally high in some sense it's just that that's not the only thing you should be looking at another one is training time so we looked at this second example where training time played a big factor as we move further and further in the future we get better and better computing resources the trend is becoming that models are able to be trained on a much shorter and shorter time frame sometimes every day sometimes even more frequently like every hour but if you're going to train a model every hour by definition that model needs to be trainable in much less than an hour more like minutes because you can't take longer than an hour to train something that needs to be trained hourly that makes sense so training time is another factor that you need to think about based on how frequently you need this model as we saw in this case training volume we didn't explicitly touch on that in either of these but this is a very important one as well some machine learning models i'm looking at you neural networks takes a lot of training data to be effective other machine learning models such as linear regression more the the simpler models do not require as much training data to be effective at doing their jobs to give you a quick two second explainer on why this is true when you think about a neural network especially a complicated neural network with many layers and many nodes per layer there's a lot of weights to train and when you have let's say a million weights but you only have a data set of size 1000 to put it simply the model is quite overpowered in this case there's many configurations of these weights that are going to fit your training and validation data fairly well but that doesn't necessarily mean that those are the weights that are correct that are most generalized to the data to give you a much easier example let's just say i gave you two points and i say give me a model that is going to fit those two points and if you're using a very simple linear model there's only one thing i could draw right i could draw the line through those two points if you tell me that i'm going to use a higher order polynomial like a quadratic or a cubic then there's lots of other curves i could draw for example with quadratic i could draw this guy or i could draw this guy if you see what i mean so as you have more and more weights in the model there's many things that work for your training and validation data but that doesn't necessarily mean that they're going to fit the true trend of the data quite simply there's just not enough data so the volume required for training is also a very important thing especially in cases where you don't have a lot of training data to begin with interpretability very important very underrated i've talked about this a lot in previous videos but put really simply if two models are the same on all metrics except one is easier to explain and interpret go with that one easiest solution is the better solution inference time as we saw is very important too especially in cases where you need the prediction to be served very fast like a search engine you can't be taking a minute to make a prediction for a search engine you need to be doing it in milliseconds so inference time becomes important there and finally storage we didn't talk about storage either but some machine learning models take a lot of actual just storage room in the computer or wherever this model is being stored in order to host this model so i'm looking at k-nearest neighbor for this one as we said before k-nearest neighbor is not really a model it is your training data itself so if you think about your training data having you know dozens of features and millions of rows you need to just store this thing somewhere versus a decision tree again is just a series of if statements much more compact so it doesn't take nearly as much storage to store this guy and so you might accept a lower accuracy a lower precision metric on decision tree if it means that your storage is going to go way down so these are just some of the ones that i've thought of i'm sure there's others again please put them in the comments below but the main point of this video is just trying to get across that there is not one way to tell if a model is good there are many things you need to look at some become important some become not important based on your use case based on what kind of problem you're trying to solve so hopefully you enjoyed this video please like and subscribe for more like this and i will see you next time

Original Description

My Patreon : https://www.patreon.com/user?u=49277905 Picture Refs : https://www.freepik.com/photos/pattern Pattern photo created by jcomp
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from ritvikmath · ritvikmath · 0 of 60

← Previous Next →
1 Math Team Update
Math Team Update
ritvikmath
2 Single Variable Calculus Volume of a Sphere - Proof 1
Single Variable Calculus Volume of a Sphere - Proof 1
ritvikmath
3 Single Variable Calculus Volume of a Sphere - Proof 2
Single Variable Calculus Volume of a Sphere - Proof 2
ritvikmath
4 Multivariable Calculus Volume of a Sphere Proof - Triple Integrals
Multivariable Calculus Volume of a Sphere Proof - Triple Integrals
ritvikmath
5 Multivariable Calculus Volume of a Sphere Proof - Double Integrals
Multivariable Calculus Volume of a Sphere Proof - Double Integrals
ritvikmath
6 The Euclidian Algorithm
The Euclidian Algorithm
ritvikmath
7 Proving the Chain Rule
Proving the Chain Rule
ritvikmath
8 Proving the Fundamental Theorem of Calculus Part 1
Proving the Fundamental Theorem of Calculus Part 1
ritvikmath
9 Proving the Fundamental Theorem of Calculus Part 2
Proving the Fundamental Theorem of Calculus Part 2
ritvikmath
10 Math Puzzle - Poison Perplexity
Math Puzzle - Poison Perplexity
ritvikmath
11 Math Puzzle - Poison Perplexity - Solution
Math Puzzle - Poison Perplexity - Solution
ritvikmath
12 Expected Value and Variance of Continuous Random Variables (Calculus)
Expected Value and Variance of Continuous Random Variables (Calculus)
ritvikmath
13 Expected Value and Variance of Discrete Random Variables (No Calculus)
Expected Value and Variance of Discrete Random Variables (No Calculus)
ritvikmath
14 Array Method
Array Method
ritvikmath
15 Complex Power Series and their Derivatives
Complex Power Series and their Derivatives
ritvikmath
16 Distributions - Intro
Distributions - Intro
ritvikmath
17 The Poisson Distribution
The Poisson Distribution
ritvikmath
18 The Bernoulli Distribution
The Bernoulli Distribution
ritvikmath
19 The Binomial Distribution
The Binomial Distribution
ritvikmath
20 The Continuous Uniform Distribution
The Continuous Uniform Distribution
ritvikmath
21 The Geometric Distribution
The Geometric Distribution
ritvikmath
22 The Triangular Distribution
The Triangular Distribution
ritvikmath
23 The Exponential Distribution
The Exponential Distribution
ritvikmath
24 The Borel Distribution + Notes on Poisson Distribution
The Borel Distribution + Notes on Poisson Distribution
ritvikmath
25 The Gamma Distribution
The Gamma Distribution
ritvikmath
26 The Normal Distribution
The Normal Distribution
ritvikmath
27 The Laplace Distribution
The Laplace Distribution
ritvikmath
28 The Chi - Squared Distribution
The Chi - Squared Distribution
ritvikmath
29 Overfitting
Overfitting
ritvikmath
30 Vector Norms
Vector Norms
ritvikmath
31 Truths Behind the Titanic : K-Nearest Neighbor
Truths Behind the Titanic : K-Nearest Neighbor
ritvikmath
32 The Mathematics of Breakups
The Mathematics of Breakups
ritvikmath
33 Sillyfish
Sillyfish
ritvikmath
34 Finding Optimal Paths - Dynamic Programming
Finding Optimal Paths - Dynamic Programming
ritvikmath
35 HowToDataScience : Scraping Twitter Data
HowToDataScience : Scraping Twitter Data
ritvikmath
36 Decision Trees
Decision Trees
ritvikmath
37 Perceptron
Perceptron
ritvikmath
38 Naive Bayes
Naive Bayes
ritvikmath
39 K-Nearest Neighbor
K-Nearest Neighbor
ritvikmath
40 Evaluating Machine Learning Models
Evaluating Machine Learning Models
ritvikmath
41 Decision Tree Pruning
Decision Tree Pruning
ritvikmath
42 K-Means Clustering
K-Means Clustering
ritvikmath
43 Gaussian Mixture Model
Gaussian Mixture Model
ritvikmath
44 Data Science - Fuzzy Record Matching
Data Science - Fuzzy Record Matching
ritvikmath
45 Time Series Talk : Autocorrelation and Partial Autocorrelation
Time Series Talk : Autocorrelation and Partial Autocorrelation
ritvikmath
46 Time Series Talk : Autoregressive Model
Time Series Talk : Autoregressive Model
ritvikmath
47 Time Series Talk : Moving Average Model
Time Series Talk : Moving Average Model
ritvikmath
48 Time Series Talk : ARMA Model
Time Series Talk : ARMA Model
ritvikmath
49 Time Series Talk : ARCH Model
Time Series Talk : ARCH Model
ritvikmath
50 Time Series Talk : White Noise
Time Series Talk : White Noise
ritvikmath
51 Time Series Talk : Stationarity
Time Series Talk : Stationarity
ritvikmath
52 Time Series Talk : ARIMA Model
Time Series Talk : ARIMA Model
ritvikmath
53 Time Series Talk : Lag Operator
Time Series Talk : Lag Operator
ritvikmath
54 Time Series Talk : What is Seasonality ?
Time Series Talk : What is Seasonality ?
ritvikmath
55 Time Series Talk : Seasonal ARIMA Model
Time Series Talk : Seasonal ARIMA Model
ritvikmath
56 So ... What Actually is a Matrix ? : Data Science Basics
So ... What Actually is a Matrix ? : Data Science Basics
ritvikmath
57 Derivative of a Matrix : Data Science Basics
Derivative of a Matrix : Data Science Basics
ritvikmath
58 Basics of PCA (Principal Component Analysis) : Data Science Concepts
Basics of PCA (Principal Component Analysis) : Data Science Concepts
ritvikmath
59 Eigenvalues & Eigenvectors : Data Science Basics
Eigenvalues & Eigenvectors : Data Science Basics
ritvikmath
60 The Covariance Matrix : Data Science Basics
The Covariance Matrix : Data Science Basics
ritvikmath

The video teaches the importance of considering multiple factors in model selection, including accuracy, inference time, and interpretability, and provides examples with different machine learning models. It highlights the trade-offs between these factors and the need for a nuanced approach to evaluating model performance. By watching this video, viewers can learn how to select the most appropriate model for their specific use case.

Key Takeaways
  1. Compare the performance of different models, such as k-nearest neighbor and decision tree, based on accuracy and inference time
  2. Train a linear regression model and a neural network model to predict sales
  3. Evaluate the performance of the models using metrics such as mean absolute percent error
  4. Consider the interpretability and training time of the models in addition to their performance
  5. Select the most appropriate model based on the specific use case and requirements
💡 Model performance is not the only factor to consider in machine learning, and a nuanced approach that takes into account multiple factors, including accuracy, inference time, and interpretability, is necessary for effective model selection.

Related AI Lessons

Up next
Learn Deep Learning by Hand (Beginner's Guide - Part 1)
Thu Vu
Watch →