Why Logistic Regression DOESN'T return probabilities?!
Key Takeaways
The video discusses why logistic regression doesn't always return true probability values and explains model calibration using techniques like plot regression and isotonic regression, with tools such as scikit-learn and github.
Full Transcript
in machine learning classification problems are very common the input is a set of features and the output is a continuous value between 0 and 1 that we interpret as a probability but in reality these values aren't necessarily reflective of true probability values now a main question here is well how does that even happen well the primary reason is data imbalance in problems like fraud detection where we have a few positive samples it makes sense to undersample the negative class or overweight the positive class if we want to train a model we do this so that the model can detect these positive instances but as a result the probabilities returned by the model may be skewed higher and in order to make sure that the values reflect true probabilities then we need to calibrate the model and we'll see this in code later on too now how do we actually calibrate a model well we train a model as normal as we usually would for a classification problem we then pipe this output and use it to train a calibration model with one feature and the corresponding label here would be the actual label for the original model 2. now there are two main types of calibration methods that are widely used one of them is plot regression and then we have isotonic regression plot regression basically uses a logistic regressor and it is better suited for more simplistic cases whereas isotonic aggression fits a more complex piecewise linear model and can be used to find more complex relationships we can try either method and see how your model works and depending on the model and also how your data is as well now with that primer out of the way let's get into some code alrighty now so let's take a look at some code i opened up a notebook um installed scikit learn and imported a bunch of packages from scikit learn so first of all we have mate classification which is used to create our classification data set i'm just going to be creating dummy data calibrated classifier cv actually performs the calibration behind the scenes that i discussed previously calibration curve is a good way to visualize whether a model is actually calibrated or not or how exactly how well a model is calibrated train test split used to split your data into training test sets logistic aggression this is the main model that we're going to be using for our dummy data excuse me roc and auc score just um it's a metric that is just going to quantify how well the model is performing uh breyer score loss is kind of like calibration curve where calibration curve is good to visualize how well a model is calibrated visually via graph whereas briar score loss is good to determine how well a model is calibrated via just like a number and then we have a bunch of other like common functions right over here so let's take the first case where we create a classification data set with um 10 000 samples and it's a balanced data set which means that well there's equal number of positive and negative labels and in this data set we're going to have 10 features um all of them are going to be significant they're going to be important features now i'm going to be splitting this up in this cell into train dev and test sets with an 80 10 and 10 split we're using the train set to train the model the dev set or the x val and y val we're going to be using for calibrating the model and then the tests for actually testing the model and getting these results and you can see like just looking at the distribution of the labels in the train set they're pretty even so four since there's ten thousand samples eight thousand that in the train set one thousand evaluation set and one thousand test set so in the eight thousand we have four thousand four thousand which is about right fifty fifty so okay let's first consider the calibrated model case so right now we're just gonna fit a logistic regression model on the training data and then make some predictions why pred will be basically a list of probabilities so if i look at the distribution of probabilities um you kind of see like okay half of them are below 47 half of them are above 47 which kind of makes sense that's correct um in this case the roc the auc is about 94.2 percent pretty good model we'll roll with it the briar's score loss is 0.0922 now mathematically the brighter score is the difference between the test as well as the predicted probability squared it's just the average of those squares and so as you can see like if it's lower then that means it's better basically uh and this little plot is going to be of the calibration curve now like i said before this plot just signifies how well a model is calibrated um ideally this should be um it should be very similar to y is equal to x uh i'll just explain what probability of positives and fraction of positives is so probability of positives each of these x-axis is kind of the the value of the label prediction probability and the y-axis is like how many of these labels are what percentage of these labels are actually you know positive labels which ideally should be equal um a good way to though understand what calibration curve really does behind the scenes is kind of just to open the github repo which i have right over here of its implementation uh so right now we passed in bins is equal to 10 and this default is five so what it's going to do is actually take the entire array of from like zero to one and just segment it into ten equal parts which is like zero to zero point one is one bin zero point one to zero point two is another bin and so on and that happens let me actually put that up right here in the code that happens right over here on this line 875 where we're just creating equal bins and then what we do later is that we're going to take all of the 1000 evaluation examples and just do and just like put it into the bins where it where the probability lies so we have 1000 samples if it lies between 0 and 0.1 put in the first bin if it lies between 0 0.1 and 0.2 put in the second bin and then what we do is we're going to count compute whatever was on the x and the y axis so prob true right here is basically saying what fraction of in each case like what was the fraction of samples that were of the positive class for every single bin we're gonna compute it and then prediction probability is like in every single bin you know each of them actually corresponds to a probability value that was returned by the model what is the average for each of those bins now in every case they should be as close to each other as possible so which is why you see like this ideally should be a straight line right um and when we actually look here it kind of does look pretty straight it almost is like a y is equal to x which means that the values that are returned by this logistic regression function over here are pretty good yeah they do are representative of probabilities or pretty close to that all right so now that we have the uncalibrated model set what happens if we calibrate this balanced data set so basically what happens is we take clf which is the trained classifier and then we pass it into this calibrated classifier cv and what we're doing here is just saying hey we've already pre-fit this model class clf so all we're going to do is apply an isotonic regression on it and then calibrate the model and how we're going to calibrate it is using the evaluation data which is another set of like 1000 examples and so what we're doing here is we're going to make the predictions right here with predict proba and then you can see that the distribution of like the predictions of the calibrated model are pretty similar to what we saw previously with the train set right up here so you can see like before it was like 0.47 was the median and now it's like 0.5 which honestly isn't much of a difference auc is kind of similar too and the briar score is very comparable 0.092 which i think was the same previously it was so basically calibration doesn't really do too much here and yeah we still get you know a curve that's very similar to y is equal to x and by the way i think i have to correct myself real quick here so this curve when i think i mentioned that it was created by only a thousand samples of the valuation set um that's wrong it was actually computed by a thousand samples of the test set i'm not sure if i clarified that correctly oh well now you know this is computed from the test set uh oh and this is also computed from the test set as well because we are calibrating the model and then we are making predictions via a test set we're calibrating it using the valuation set though but we're making this plot via the test set i think i've repeated myself three times there but that's okay as long as we all understand um so yeah basically calibration didn't really do much to to this because it's a well-balanced data set and logistic regression is pretty good at returning probabilities now this is like kind of like an extra thing where you know i i've just seen where not a non-so recommended approach of basically using your training set to also calibrate your model probably not the best approach again because you're training and calibrating at the same time this may lead to certain biases but i've seen it in certain tutorials out there so i'm throwing it out here anyways but it's good to at least refer all right now moving on to the unbalanced dataset case now these are cases like you know the case of fraud data where you might have only a few cases of fraud but like an abundance of just normal transactions that occur so in this case i'm also creating 10 000 examples with 10 features all of them significant and we have of these 10 000 1000 of them are positive and the other nine thousand of them are negative samples and right here i'm kind of doing a split of train test and evaluation again eighty ten ten and you can see here we still have like a one is to nine uh one is denied uh ratio which kind of agrees with the weights that we've given so kind of representative of what we would see with like you know fraud data so like that like we've done before for the balanced data set case let's look at what happens if we pass this into an uncalibrated model so we'll basically pass into a logistic regression and what we're doing here is we're passing in a parameter called class weight is equal to balanced what this does is that you know because it's an unbalanced data set the positive labels are going to be weighted like nine times more than that of the negative labels or i should say the negative examples so yeah this is done so that the model is better able to pick up on these positive examples and is also kind of a requirement so once we're done there we'll fit the model we'll make predictions we see that the aoc is like 90 pretty good briar score 0.087 again okay that's fine um and now when we describe the predictions though looking at the predictions of just this uncalibrated model we can see that like 50 of them are under 10 the prediction is under like 10 okay so this is just something to keep in mind because let's we'll be comparing it later to the calibrated model case and you'll see the difference in probabilities so now if we were to just create the calibration curve on the test set you can see that it's very deviated now from the um from from y is equal to x from that straight diagonal line so this is indicative when i look at this plot i see okay the model is really not that calibrated which means that these probability values that we see that are being returned in um y print df are actually not very representative of probabilities so now okay it becomes pretty apparent so what do we do here let's try to calibrate the model so we have our classifier again and we pass it to our calibrator calibrated classifier cv we calibrate the model with the valuation data set and then we make predictions now we have an auc that's not too different from before but look at our briar score it's now 0.05 which is definitely better than the 0.08 that we saw previously which is good and now when you kind of look at the predictions right over here the kind of predictions that are basically returned by our calibration model are like 1.5 that's the median before the median was 10 so you can see that the probability values have now completely decreased compared to what they were in the uncalibrated case and this is kind of what i hinted at back in the explanation before i showed all this code now these should be more representative true probabilities why is that the case well if we look at this calibration curve right now you can see it's much closer to a y is equal to x and so these values are actually more representative of true probabilities and yeah and this is just like the same case what i mentioned before where we're like training and evaluation happens on using this just like one set of data at the same time one shebang so yeah that's kind of all about like model calibration and an interesting place where you would use this is like anywhere where like you really need absolute probability values to be representative of actual probability values kind of like you know an expectation problems when you're finding the expected value of perhaps you know one of your features and this actually i have illustrated very detailed in another video on expectations so i think it was a video that came out before this so if you want to check out a cool implementation using probabilities and calibration i suggest you check that video out other than that though i have some references down in the description below or either and rather actually right here on the end of this notebook and this code will be available on github link also in the description below so yeah just please comment like subscribe do everything you need to do to get the word out trying to grow a good channel here so stay tuned stay safe and i'll see you later bye bye
Original Description
Model Calibration - EXPLAINED! Model Calibration. Fun!
SPONSOR
Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite. Love it! Learn more here:
https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=codeemporium&utm_content=description-only
CODE: https://github.com/ajhalthor/model-calibration
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from CodeEmporium · CodeEmporium · 59 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
▶
60
Linear Regression and Multiple Regression
CodeEmporium
Logistic Regression - THE MATH YOU SHOULD KNOW!
CodeEmporium
Generative Adversarial Networks - FUTURISTIC & FUN AI !
CodeEmporium
Deep Learning on the Cloud - GPU TO LEARN FASTER
CodeEmporium
Deep Mind's AlphaGo Zero - EXPLAINED
CodeEmporium
Mask Region based Convolution Neural Networks - EXPLAINED!
CodeEmporium
Attention in Neural Networks
CodeEmporium
Depthwise Separable Convolution - A FASTER CONVOLUTION!
CodeEmporium
One Neural network learns EVERYTHING ?!
CodeEmporium
Neural Voice Cloning
CodeEmporium
AI creates Image Classifiers…by DRAWING?
CodeEmporium
Unpaired Image-Image Translation using CycleGANs
CodeEmporium
K-Means Clustering - EXPLAINED!
CodeEmporium
Random Forest Classification
CodeEmporium
Data Science in Finance
CodeEmporium
Hypothesis testing with Applications in Data Science
CodeEmporium
A/B Testing - Simply Explained
CodeEmporium
The Kernel Trick - THE MATH YOU SHOULD KNOW!
CodeEmporium
Support Vector Machines - THE MATH YOU SHOULD KNOW
CodeEmporium
Principal Component Analysis (PCA) - THE MATH YOU SHOULD KNOW!
CodeEmporium
History of Calculus - Animated
CodeEmporium
Curiosity in AI
CodeEmporium
DropBlock - A BETTER DROPOUT for Neural Networks
CodeEmporium
Autoencoders - EXPLAINED
CodeEmporium
Recurrent Neural Networks - EXPLAINED!
CodeEmporium
LSTM Networks - EXPLAINED!
CodeEmporium
Building an Image Captioner with Neural Networks
CodeEmporium
10 Machine Learning Questions - ANSWERED!
CodeEmporium
How do neural networks work?
CodeEmporium
Evolution of Face Generation | Evolution of GANs
CodeEmporium
How does Google Translate's AI work?
CodeEmporium
How to keep up with AI research?
CodeEmporium
How does YouTube recommend videos? - AI EXPLAINED!
CodeEmporium
Variational Autoencoders - EXPLAINED!
CodeEmporium
Logistic Regression - VISUALIZED!
CodeEmporium
Gradient Descent - THE MATH YOU SHOULD KNOW
CodeEmporium
Boosting - EXPLAINED!
CodeEmporium
Transformer Neural Networks - EXPLAINED! (Attention is all you need)
CodeEmporium
Loss Functions - EXPLAINED!
CodeEmporium
Optimizers - EXPLAINED!
CodeEmporium
NLP with Neural Networks & Transformers
CodeEmporium
Batch Normalization - EXPLAINED!
CodeEmporium
Activation Functions - EXPLAINED!
CodeEmporium
Data Scientist Answers Interview Questions
CodeEmporium
Why use GPU with Neural Networks?
CodeEmporium
How do GPUs speed up Neural Network training?
CodeEmporium
BERT Neural Network - EXPLAINED!
CodeEmporium
ConvNets Scaled Efficiently
CodeEmporium
Transformer Neural Net makes music! (JukeboxAI)
CodeEmporium
What do filters of Convolution Neural Network learn?
CodeEmporium
We're hosting a Machine Learning Conference!
CodeEmporium
MLconfEU 2020: Machine Learning Conference for Software Engineers
CodeEmporium
Are Neural Networks Intelligent?
CodeEmporium
Time Series Forecasting with Machine Learning
CodeEmporium
Few Shot Learning - EXPLAINED!
CodeEmporium
How does a Data Scientist Fight FRAUD?
CodeEmporium
How would a Data Scientist analyze Customer Churn?
CodeEmporium
Expectations with Machine Learning
CodeEmporium
Why Logistic Regression DOESN'T return probabilities?!
CodeEmporium
How you SHOULD code Machine Learning
CodeEmporium
More on: Reading ML Papers
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Medium · AI
ICMI 2026 Reviews [D]
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Reddit r/MachineLearning
🎓
Tutor Explanation
DeepCamp AI