Why Logistic Regression DOESN'T return probabilities?!

CodeEmporium · Beginner ·📄 Research Papers Explained ·5y ago

Skills: Reading ML Papers90%Research Methods90%ML Maths Basics80%Supervised Learning80%

Key Takeaways

The video discusses why logistic regression doesn't always return true probability values and explains model calibration using techniques like plot regression and isotonic regression, with tools such as scikit-learn and github.

Full Transcript

in machine learning classification problems are very common the input is a set of features and the output is a continuous value between 0 and 1 that we interpret as a probability but in reality these values aren't necessarily reflective of true probability values now a main question here is well how does that even happen well the primary reason is data imbalance in problems like fraud detection where we have a few positive samples it makes sense to undersample the negative class or overweight the positive class if we want to train a model we do this so that the model can detect these positive instances but as a result the probabilities returned by the model may be skewed higher and in order to make sure that the values reflect true probabilities then we need to calibrate the model and we'll see this in code later on too now how do we actually calibrate a model well we train a model as normal as we usually would for a classification problem we then pipe this output and use it to train a calibration model with one feature and the corresponding label here would be the actual label for the original model 2. now there are two main types of calibration methods that are widely used one of them is plot regression and then we have isotonic regression plot regression basically uses a logistic regressor and it is better suited for more simplistic cases whereas isotonic aggression fits a more complex piecewise linear model and can be used to find more complex relationships we can try either method and see how your model works and depending on the model and also how your data is as well now with that primer out of the way let's get into some code alrighty now so let's take a look at some code i opened up a notebook um installed scikit learn and imported a bunch of packages from scikit learn so first of all we have mate classification which is used to create our classification data set i'm just going to be creating dummy data calibrated classifier cv actually performs the calibration behind the scenes that i discussed previously calibration curve is a good way to visualize whether a model is actually calibrated or not or how exactly how well a model is calibrated train test split used to split your data into training test sets logistic aggression this is the main model that we're going to be using for our dummy data excuse me roc and auc score just um it's a metric that is just going to quantify how well the model is performing uh breyer score loss is kind of like calibration curve where calibration curve is good to visualize how well a model is calibrated visually via graph whereas briar score loss is good to determine how well a model is calibrated via just like a number and then we have a bunch of other like common functions right over here so let's take the first case where we create a classification data set with um 10 000 samples and it's a balanced data set which means that well there's equal number of positive and negative labels and in this data set we're going to have 10 features um all of them are going to be significant they're going to be important features now i'm going to be splitting this up in this cell into train dev and test sets with an 80 10 and 10 split we're using the train set to train the model the dev set or the x val and y val we're going to be using for calibrating the model and then the tests for actually testing the model and getting these results and you can see like just looking at the distribution of the labels in the train set they're pretty even so four since there's ten thousand samples eight thousand that in the train set one thousand evaluation set and one thousand test set so in the eight thousand we have four thousand four thousand which is about right fifty fifty so okay let's first consider the calibrated model case so right now we're just gonna fit a logistic regression model on the training data and then make some predictions why pred will be basically a list of probabilities so if i look at the distribution of probabilities um you kind of see like okay half of them are below 47 half of them are above 47 which kind of makes sense that's correct um in this case the roc the auc is about 94.2 percent pretty good model we'll roll with it the briar's score loss is 0.0922 now mathematically the brighter score is the difference between the test as well as the predicted probability squared it's just the average of those squares and so as you can see like if it's lower then that means it's better basically uh and this little plot is going to be of the calibration curve now like i said before this plot just signifies how well a model is calibrated um ideally this should be um it should be very similar to y is equal to x uh i'll just explain what probability of positives and fraction of positives is so probability of positives each of these x-axis is kind of the the value of the label prediction probability and the y-axis is like how many of these labels are what percentage of these labels are actually you know positive labels which ideally should be equal um a good way to though understand what calibration curve really does behind the scenes is kind of just to open the github repo which i have right over here of its implementation uh so right now we passed in bins is equal to 10 and this default is five so what it's going to do is actually take the entire array of from like zero to one and just segment it into ten equal parts which is like zero to zero point one is one bin zero point one to zero point two is another bin and so on and that happens let me actually put that up right here in the code that happens right over here on this line 875 where we're just creating equal bins and then what we do later is that we're going to take all of the 1000 evaluation examples and just do and just like put it into the bins where it where the probability lies so we have 1000 samples if it lies between 0 and 0.1 put in the first bin if it lies between 0 0.1 and 0.2 put in the second bin and then what we do is we're going to count compute whatever was on the x and the y axis so prob true right here is basically saying what fraction of in each case like what was the fraction of samples that were of the positive class for every single bin we're gonna compute it and then prediction probability is like in every single bin you know each of them actually corresponds to a probability value that was returned by the model what is the average for each of those bins now in every case they should be as close to each other as possible so which is why you see like this ideally should be a straight line right um and when we actually look here it kind of does look pretty straight it almost is like a y is equal to x which means that the values that are returned by this logistic regression function over here are pretty good yeah they do are representative of probabilities or pretty close to that all right so now that we have the uncalibrated model set what happens if we calibrate this balanced data set so basically what happens is we take clf which is the trained classifier and then we pass it into this calibrated classifier cv and what we're doing here is just saying hey we've already pre-fit this model class clf so all we're going to do is apply an isotonic regression on it and then calibrate the model and how we're going to calibrate it is using the evaluation data which is another set of like 1000 examples and so what we're doing here is we're going to make the predictions right here with predict proba and then you can see that the distribution of like the predictions of the calibrated model are pretty similar to what we saw previously with the train set right up here so you can see like before it was like 0.47 was the median and now it's like 0.5 which honestly isn't much of a difference auc is kind of similar too and the briar score is very comparable 0.092 which i think was the same previously it was so basically calibration doesn't really do too much here and yeah we still get you know a curve that's very similar to y is equal to x and by the way i think i have to correct myself real quick here so this curve when i think i mentioned that it was created by only a thousand samples of the valuation set um that's wrong it was actually computed by a thousand samples of the test set i'm not sure if i clarified that correctly oh well now you know this is computed from the test set uh oh and this is also computed from the test set as well because we are calibrating the model and then we are making predictions via a test set we're calibrating it using the valuation set though but we're making this plot via the test set i think i've repeated myself three times there but that's okay as long as we all understand um so yeah basically calibration didn't really do much to to this because it's a well-balanced data set and logistic regression is pretty good at returning probabilities now this is like kind of like an extra thing where you know i i've just seen where not a non-so recommended approach of basically using your training set to also calibrate your model probably not the best approach again because you're training and calibrating at the same time this may lead to certain biases but i've seen it in certain tutorials out there so i'm throwing it out here anyways but it's good to at least refer all right now moving on to the unbalanced dataset case now these are cases like you know the case of fraud data where you might have only a few cases of fraud but like an abundance of just normal transactions that occur so in this case i'm also creating 10 000 examples with 10 features all of them significant and we have of these 10 000 1000 of them are positive and the other nine thousand of them are negative samples and right here i'm kind of doing a split of train test and evaluation again eighty ten ten and you can see here we still have like a one is to nine uh one is denied uh ratio which kind of agrees with the weights that we've given so kind of representative of what we would see with like you know fraud data so like that like we've done before for the balanced data set case let's look at what happens if we pass this into an uncalibrated model so we'll basically pass into a logistic regression and what we're doing here is we're passing in a parameter called class weight is equal to balanced what this does is that you know because it's an unbalanced data set the positive labels are going to be weighted like nine times more than that of the negative labels or i should say the negative examples so yeah this is done so that the model is better able to pick up on these positive examples and is also kind of a requirement so once we're done there we'll fit the model we'll make predictions we see that the aoc is like 90 pretty good briar score 0.087 again okay that's fine um and now when we describe the predictions though looking at the predictions of just this uncalibrated model we can see that like 50 of them are under 10 the prediction is under like 10 okay so this is just something to keep in mind because let's we'll be comparing it later to the calibrated model case and you'll see the difference in probabilities so now if we were to just create the calibration curve on the test set you can see that it's very deviated now from the um from from y is equal to x from that straight diagonal line so this is indicative when i look at this plot i see okay the model is really not that calibrated which means that these probability values that we see that are being returned in um y print df are actually not very representative of probabilities so now okay it becomes pretty apparent so what do we do here let's try to calibrate the model so we have our classifier again and we pass it to our calibrator calibrated classifier cv we calibrate the model with the valuation data set and then we make predictions now we have an auc that's not too different from before but look at our briar score it's now 0.05 which is definitely better than the 0.08 that we saw previously which is good and now when you kind of look at the predictions right over here the kind of predictions that are basically returned by our calibration model are like 1.5 that's the median before the median was 10 so you can see that the probability values have now completely decreased compared to what they were in the uncalibrated case and this is kind of what i hinted at back in the explanation before i showed all this code now these should be more representative true probabilities why is that the case well if we look at this calibration curve right now you can see it's much closer to a y is equal to x and so these values are actually more representative of true probabilities and yeah and this is just like the same case what i mentioned before where we're like training and evaluation happens on using this just like one set of data at the same time one shebang so yeah that's kind of all about like model calibration and an interesting place where you would use this is like anywhere where like you really need absolute probability values to be representative of actual probability values kind of like you know an expectation problems when you're finding the expected value of perhaps you know one of your features and this actually i have illustrated very detailed in another video on expectations so i think it was a video that came out before this so if you want to check out a cool implementation using probabilities and calibration i suggest you check that video out other than that though i have some references down in the description below or either and rather actually right here on the end of this notebook and this code will be available on github link also in the description below so yeah just please comment like subscribe do everything you need to do to get the word out trying to grow a good channel here so stay tuned stay safe and i'll see you later bye bye

Original Description

Model Calibration - EXPLAINED! Model Calibration. Fun! SPONSOR Kite is a free AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite. Love it! Learn more here: https://www.kite.com/get-kite/?utm_medium=referral&utm_source=youtube&utm_campaign=codeemporium&utm_content=description-only CODE: https://github.com/ajhalthor/model-calibration

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from CodeEmporium · CodeEmporium · 59 of 60

← Previous Next →

Linear Regression and Multiple Regression

Linear Regression and Multiple Regression

Logistic Regression - THE MATH YOU SHOULD KNOW!

Logistic Regression - THE MATH YOU SHOULD KNOW!

Generative Adversarial Networks - FUTURISTIC & FUN AI !

Generative Adversarial Networks - FUTURISTIC & FUN AI !

Deep Learning on the Cloud - GPU TO LEARN FASTER

Deep Learning on the Cloud - GPU TO LEARN FASTER

Deep Mind's AlphaGo Zero - EXPLAINED

Deep Mind's AlphaGo Zero - EXPLAINED

Mask Region based Convolution Neural Networks - EXPLAINED!

Mask Region based Convolution Neural Networks - EXPLAINED!

Attention in Neural Networks

Attention in Neural Networks

Depthwise Separable Convolution - A FASTER CONVOLUTION!

Depthwise Separable Convolution - A FASTER CONVOLUTION!

One Neural network learns EVERYTHING ?!

One Neural network learns EVERYTHING ?!

Neural Voice Cloning

Neural Voice Cloning

AI creates Image Classifiers…by DRAWING?

AI creates Image Classifiers…by DRAWING?

Unpaired Image-Image Translation using CycleGANs

Unpaired Image-Image Translation using CycleGANs

K-Means Clustering - EXPLAINED!

K-Means Clustering - EXPLAINED!

Random Forest Classification

Random Forest Classification

Data Science in Finance

Data Science in Finance

Hypothesis testing with Applications in Data Science

Hypothesis testing with Applications in Data Science

A/B Testing - Simply Explained

A/B Testing - Simply Explained

The Kernel Trick - THE MATH YOU SHOULD KNOW!

The Kernel Trick - THE MATH YOU SHOULD KNOW!

Support Vector Machines - THE MATH YOU SHOULD KNOW

Support Vector Machines - THE MATH YOU SHOULD KNOW

Principal Component Analysis (PCA) - THE MATH YOU SHOULD KNOW!

Principal Component Analysis (PCA) - THE MATH YOU SHOULD KNOW!

History of Calculus - Animated

History of Calculus - Animated

Curiosity in AI

Curiosity in AI

DropBlock - A BETTER DROPOUT for Neural Networks

DropBlock - A BETTER DROPOUT for Neural Networks

Autoencoders - EXPLAINED

Autoencoders - EXPLAINED

Recurrent Neural Networks - EXPLAINED!

Recurrent Neural Networks - EXPLAINED!

LSTM Networks - EXPLAINED!

LSTM Networks - EXPLAINED!

Building an Image Captioner with Neural Networks

Building an Image Captioner with Neural Networks

10 Machine Learning Questions - ANSWERED!

10 Machine Learning Questions - ANSWERED!

How do neural networks work?

How do neural networks work?

Evolution of Face Generation | Evolution of GANs

Evolution of Face Generation | Evolution of GANs

How does Google Translate's AI work?

How does Google Translate's AI work?

How to keep up with AI research?

How to keep up with AI research?

How does YouTube recommend videos? - AI EXPLAINED!

How does YouTube recommend videos? - AI EXPLAINED!

Variational Autoencoders - EXPLAINED!

Variational Autoencoders - EXPLAINED!

Logistic Regression - VISUALIZED!

Logistic Regression - VISUALIZED!

Gradient Descent - THE MATH YOU SHOULD KNOW

Gradient Descent - THE MATH YOU SHOULD KNOW

Boosting - EXPLAINED!

Boosting - EXPLAINED!

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

Loss Functions - EXPLAINED!

Loss Functions - EXPLAINED!

Optimizers - EXPLAINED!

Optimizers - EXPLAINED!

NLP with Neural Networks & Transformers

NLP with Neural Networks & Transformers

Batch Normalization - EXPLAINED!

Batch Normalization - EXPLAINED!

Activation Functions - EXPLAINED!

Activation Functions - EXPLAINED!

Data Scientist Answers Interview Questions

Data Scientist Answers Interview Questions

Why use GPU with Neural Networks?

Why use GPU with Neural Networks?

How do GPUs speed up Neural Network training?

How do GPUs speed up Neural Network training?

BERT Neural Network - EXPLAINED!

BERT Neural Network - EXPLAINED!

ConvNets Scaled Efficiently

ConvNets Scaled Efficiently

Transformer Neural Net makes music! (JukeboxAI)

Transformer Neural Net makes music! (JukeboxAI)

What do filters of Convolution Neural Network learn?

What do filters of Convolution Neural Network learn?

We're hosting a Machine Learning Conference!

We're hosting a Machine Learning Conference!

MLconfEU 2020: Machine Learning Conference for Software Engineers

MLconfEU 2020: Machine Learning Conference for Software Engineers

Are Neural Networks Intelligent?

Are Neural Networks Intelligent?

Time Series Forecasting with Machine Learning

Time Series Forecasting with Machine Learning

Few Shot Learning - EXPLAINED!

Few Shot Learning - EXPLAINED!

How does a Data Scientist Fight FRAUD?

How does a Data Scientist Fight FRAUD?

How would a Data Scientist analyze Customer Churn?

How would a Data Scientist analyze Customer Churn?

Expectations with Machine Learning

Expectations with Machine Learning

Why Logistic Regression DOESN'T return probabilities?!

Why Logistic Regression DOESN'T return probabilities?!

How you SHOULD code Machine Learning

How you SHOULD code Machine Learning

The video teaches the importance of model calibration in logistic regression and explains how to use techniques like isotonic regression to improve model performance. It also discusses how to evaluate model calibration using metrics like calibration curve and Brier score loss. By watching this video, viewers can learn how to apply research methods to improve model calibration and make more accurate predictions.

Key Takeaways

Train a logistic regression model on a dataset
Evaluate model calibration using a calibration curve
Apply isotonic regression to calibrate the model
Use class weights to handle unbalanced datasets
Evaluate model performance using metrics like AUC and Brier score

💡 Model calibration is crucial in logistic regression to ensure that predicted probabilities reflect true probabilities, and techniques like isotonic regression can be used to improve model calibration.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Reading ML Papers

View skill →

Automatic Literature Review with GPT-3 - I embedded and indexed all of arXiv into a search engine!

Automatic Literature Review with GPT-3 - I embedded and indexed all of arXiv into a search engine!

Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020

Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020

Obsidian Zotero Integration Plugin | Streamline Your Research Paper Workflow 📝️

Obsidian Zotero Integration Plugin | Streamline Your Research Paper Workflow 📝️

This FULLY FREE Research Agent can BUILD Reports in Minutes!!!

This FULLY FREE Research Agent can BUILD Reports in Minutes!!!

Claude 3.7 Sonnet API | Build a Research Assistant

Claude 3.7 Sonnet API | Build a Research Assistant

I Built An Obsidian AI Research Assistant with Oz...

I Built An Obsidian AI Research Assistant with Oz...

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way

Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics

ICMI 2026 Reviews [D]

Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances

Reddit r/MachineLearning

Workshop submission for main conference paper under review [D]

Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV

Reddit r/MachineLearning

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it

Reddit r/MachineLearning

Beyond Big Vendors: ERP Systems Explained #shorts

Digital Transformation with Eric Kimberling