How to handle Uncertainty in Deep Learning #2.2

DeepFindr · Beginner ·🧬 Deep Learning ·4y ago

Skills: Unsupervised Learning80%ML Maths Basics70%LLM Foundations50%

Key Takeaways

The video demonstrates techniques for handling uncertainty in deep learning, including Monte Carlo dropout, deep ensembles, and Bayesian neural networks, using tools such as pyro, blitz, and torch. It covers concepts like epistemic uncertainty, variational inference, and reparameterization trick, with a focus on estimating uncertainty in model parameters and visualizing confidence bands.

Full Transcript

welcome back to this last and final part of the uncertainty quantification series in this video we want to have a look at how we can implement the methods for estimating epistemic uncertainty that we've discussed in the last video in particular this refers to basic neural networks monte carlo dropout and deep ensembles so this is the collab notebook that we've used last time for estimating alia torrig uncertainty with these different methods and i've created a new section for epistemic uncertainty with a note so in order to run this section you first have to execute this first cell which generates the data set that we've used last time so just as a quick reminder we have a train set that is distributed like this and we also have a test set that has a larger input range so this one is from minus seven to plus seven and this one from -10 to plus ten and what we expect now for the epistemic models is that they report in high uncertainty in this area because this is data that the model has never seen before so we will use the same data set and because of that first execute this cell and then you can jump down here and continue with this epistemic block so here we have three very basic implementations for basic neural nets monte carlo dropout and deep ensembles and let me emphasize that i didn't invest much time into hyper parameter tuning so i'm pretty sure that you can get much more out of these models but this is just a very basic variant how you can implement them to make them report epistemic uncertainty for the predictions [Music] so let's have a look at bayesian neural nets of course the question is how do you implement that because you have to put a distribution on the weights and also support back propagation with this re-parametrization trick and there are different choices for bnns you can use libraries like pyro which are probabilistic libraries but there's also a very simple library called blitz that i found on github and blitz supports these bayesian layers and all of them basically allow you to have a distribution on the weights and also support the full pi torch back propagation process so we will quickly install blitz which is called blitz bayesian pi torch on pip and we will also import some of the previous modules we used and now let's jump right into the model so the way how this works now is that we simply use the layers from blitz.modules and build our simple network with it in this case we use bayesian linear layers that are simply fully connected layers and here we have one single input feature which is our x value and we output one single prediction and with that we can build our network and there's nothing different from a regular network now in order to make this work we need to put a decorator to this module which is called variational estimator and this also comes from blitz and with this we are able to calculate things like the loss so this variational inference loss so by adding this we basically make it a bayesian network now we can also have a look at this network by simply printing it and we see it's just like before those three linear layers but we also see that inside of these layers we have some additional things now usually you would only have your weights in this case we have two distributions for the weight and two distributions for the bias and on one hand we have the prior distributions which model our prior knowledge and are probably just simple gaussian distributions and then we have these trainable random distributions and in order to make those distributions trainable we need to apply the repair metrization trick and this means we sample outside of the network which is all handled by blitz and we simply predict the parameters of this these distributions in order to apply this variational inference just like before i also have a function that plots the predictions and in a bayesian neural net the way it works is that we just predict several times and here we have a sample size of 100 which means we we iterate over the samples and for each sample we give it the same input and this gives us a list of predictions and what we can do now is calculate the mean and standard deviation on these predictions and this gives us these confidence bands which we can visualize with a little bit of plotting down here because we previously added this decorator to our model we now have access to some additional functions and one of them is for example called sample elbow the idea is that we want to do sampling based variational inference here which means we want to approximate the posterior distribution and do this by sampling a loss and with this function we can sample the loss function for specific inputs now the loss function in our case is the evidence lower bound that i've explained in the last video it consists of two parts one is the likelihood which is our actual criterion so in this case we use just the mean squared arrow and the other one is the kl divergence that checks how close we are to the prior distribution and with those two we can calculate the loss and there is also a different function called sample elbow explicit i think and with this one we can actually get the individual losses for kl divergence and the likelihood in this case both losses are combined according to this complexity cost weight so this sort of weighs the kl divergence against our target criterion so long story short we simply use this sample elbow function on our model and pass it the inputs the labels the criterion how many times we want to sample and this weight for our cost between kl divergence and likelihoods and i have a loop here that runs over 100 epochs and every 10 epochs i have a test loop that simply tests the predictions and plots them just like we had it before now let's have a look at these predictions so this is the result after 90 epochs of training and we see that the uncertainty outside of our distribution so on the left of this bar and on the right of this one is quite high compared to the uncertainty in the middle what makes me wonder a bit is why this area and this area also seem to have a high uncertainty because the model has seen a lot of data points and therefore i think the uncertainty should also be lower so i'm not 100 satisfied with this result but this is what i got after a couple of hours of tweaking parameters [Music] so the next method we want to have a look at is monte carlo dropouts and this one is actually pretty straightforward to implement the only thing we need is a dropout layer and we can use this dropout layer several times because it's not specific to one of these layers and here i selected a dropout rate of 0.2 i read that in the paper they also use higher dropout rates like 0.5 but i found lower values to work better in my case and just like before i also have a plotting function now and the important part here is that we use the model in training mode because typically if you put your model into test mode by calling model.eval you turn off dropout and in this case we want to use dropout also for our predictions and just like before we sample several times in this case 100 and each time we get a different dropout set and because of that we get some variation on the parameters and using that we can get an estimate of the uncertainty in our parameters and we use the means and standard deviations of this sampled output distribution and plot them just like before and nothing new here we train a model in a train and test loop and print the results and we can see after some yeah after almost 100 iterations we get this picture where we see that the uncertainty in the middle is again lower than the uncertainty outside of the of the training distribution again this is not perfect but i think it shows a tendency that we have a higher uncertainty outside of the distribution and also for specific areas of the inputs but a very high certainty in in this middle area compared to debts [Music] now the last model on today's agenda are deep ensembles and those are also straightforward to implement basically it's just an ensemble of different networks and for that i've created a simple network that i've also used before and this network actually predicts mu and sigma or variance and this means we can use this model to predict aliatoric uncertainty and use the the ensemble to predict epistemic uncertainty and the idea now is that we have several of these models and for each of these models i now get the predictions and eventually we use the variance in all of these predictions and again i can use them to plot these sort of confidence bands now how does this look like in practice we just define a number of models and here i simply stack these models according to how many models i want to have and for each of these models we can use the same loss function but we have to consider that we need to use different optimizers because each optimizer uses the parameters of one specific model and in the training loop we can now simply iterate over this deep ensemble and for each of the models we can get the predictions and we simply optimize each model individually but then report a common loss for the whole ensemble and yeah in this test loop we simply plot the results and again we iterate over the models i have to say that i've also seen different approaches that might do it more efficiently but this is the most straightforward way to implement it and the results of that look like this here we have a similar picture as before we have a high uncertainty outside of the distribution and the lower one in the middle of the distribution and again we have some ranges where the model also reports high uncertainty again i still don't know why this happens i would need to further investigate this but as this was a very simple tutorial with this dummy data set i didn't go further into detail but generally it shows that we can certainly capture especially out of distribution uncertainty and also uh the areas where our model is quite certain so that's all for this uncertainty in deep learning series i hope that you found it interesting or helpful also feel free to leave a comment if you have any questions for the models or for the implementation and i think that i will do some more videos in that direction in the future because i find uncertainty in deep learning a very interesting and important topic and let me know what you think about it and i see you soon in a future video [Music]

Original Description

▬▬ Code ▬▬▬▬▬ Colab Notebook: https://colab.research.google.com/drive/1AE7g0BDQDcCDx8nzB8dRcm_VyonqhaWG?usp=sharing ▬▬ Used Music ▬▬▬▬▬▬▬▬▬▬▬ Music from Uppbeat (free for Creators!): https://uppbeat.io/t/pryces/lateflights License code: 3O8NFX8WUHJBR2SB ▬▬ Used Videos ▬▬▬▬▬▬▬▬▬▬▬ Clouds, Kelly L from Pexels ▬▬ Timestamps ▬▬▬▬▬▬▬▬▬▬▬ 00:00 Introduction 00:23 Notebook execution notice 01:50 Bayesian Neural Network 08:00 Monte Carlo Dropout 10:15 Deep Ensemble 13:02 Summary ▬▬ Support me if you like 🌟 ►Link to this channel: https://bit.ly/3zEqL1W ►Support me on Patreon: https://bit.ly/2Wed242 ►Buy me a coffee on Ko-Fi: https://bit.ly/3kJYEdl ►E-Mail: deepfindr@gmail.com ▬▬ My equipment 💻 - Microphone: https://amzn.to/3DVqB8H - Microphone mount: https://amzn.to/3BWUcOJ - Monitors: https://amzn.to/3G2Jjgr - Monitor mount: https://amzn.to/3AWGIAY - Height-adjustable table: https://amzn.to/3aUysXC - Ergonomic chair: https://amzn.to/3phQg7r - PC case: https://amzn.to/3jdlI2Y - GPU: https://amzn.to/3AWyzwy - Keyboard: https://amzn.to/2XskWHP - Bluelight filter glasses: https://amzn.to/3pj0fK2

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DeepFindr · DeepFindr · 36 of 56

← Previous Next →

Understanding Graph Neural Networks | Part 1/3 - Introduction

Understanding Graph Neural Networks | Part 1/3 - Introduction

Understanding Graph Neural Networks | Part 2/3 - GNNs and it's Variants

Understanding Graph Neural Networks | Part 2/3 - GNNs and it's Variants

Understanding Graph Neural Networks | Part 3/3 - Pytorch Geometric and Molecule Data using RDKit

Understanding Graph Neural Networks | Part 3/3 - Pytorch Geometric and Molecule Data using RDKit

Node Classification on Knowledge Graphs using PyTorch Geometric

Node Classification on Knowledge Graphs using PyTorch Geometric

Understanding Convolutional Neural Networks | Part 1 / 3 - The Basics

Understanding Convolutional Neural Networks | Part 1 / 3 - The Basics

Understanding Convolutional Neural Networks | Part 2 / 3 - Wonders of the world CNN with PyTorch

Understanding Convolutional Neural Networks | Part 2 / 3 - Wonders of the world CNN with PyTorch

Understanding Convolutional Neural Networks | Part 3 / 3 - Transfer Learning and Explainable AI

Understanding Convolutional Neural Networks | Part 3 / 3 - Transfer Learning and Explainable AI

How to use edge features in Graph Neural Networks (and PyTorch Geometric)

How to use edge features in Graph Neural Networks (and PyTorch Geometric)

Explainable AI explained! | #1 Introduction

Explainable AI explained! | #1 Introduction

Explainable AI explained! | #2 By-design interpretable models with Microsofts InterpretML

Explainable AI explained! | #2 By-design interpretable models with Microsofts InterpretML

Explainable AI explained! | #3 LIME

Explainable AI explained! | #3 LIME

Explainable AI explained! | #4 SHAP

Explainable AI explained! | #4 SHAP

Explainable AI explained! | #5 Counterfactual explanations and adversarial attacks

Explainable AI explained! | #5 Counterfactual explanations and adversarial attacks

Explainable AI explained! | #6 Layerwise Relevance Propagation with MRI data

Explainable AI explained! | #6 Layerwise Relevance Propagation with MRI data

Understanding Graph Attention Networks

Understanding Graph Attention Networks

GNN Project #1 - Introduction to HIV dataset

GNN Project #1 - Introduction to HIV dataset

GNN Project #2 - Creating a Custom Dataset in Pytorch Geometric

GNN Project #2 - Creating a Custom Dataset in Pytorch Geometric

GNN Project #3.2 - Graph Transformer

GNN Project #3.2 - Graph Transformer

GNN Project #4.1 - Graph Variational Autoencoders

GNN Project #4.1 - Graph Variational Autoencoders

GNN Project #4.2 - GVAE Training and Adjacency reconstruction

GNN Project #4.2 - GVAE Training and Adjacency reconstruction

GNN Project #4.3 - One-shot molecule generation - Part 1

GNN Project #4.3 - One-shot molecule generation - Part 1

GNN Project #4.3 - Code explanation

GNN Project #4.3 - Code explanation

Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 1/2

Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 1/2

Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 2/2

Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 2/2

How to explain Graph Neural Networks (with XAI)

How to explain Graph Neural Networks (with XAI)

Explaining Twitch Predictions with GNNExplainer

Explaining Twitch Predictions with GNNExplainer

Python Graph Neural Network Libraries (an Overview)

Python Graph Neural Network Libraries (an Overview)

Friendly Introduction to Temporal Graph Neural Networks (and some Traffic Forecasting)

Friendly Introduction to Temporal Graph Neural Networks (and some Traffic Forecasting)

Traffic Forecasting with Pytorch Geometric Temporal

Traffic Forecasting with Pytorch Geometric Temporal

Fraud Detection with Graph Neural Networks

Fraud Detection with Graph Neural Networks

Fake News Detection using Graphs with Pytorch Geometric

Fake News Detection using Graphs with Pytorch Geometric

Recommender Systems using Graph Neural Networks

Recommender Systems using Graph Neural Networks

How to handle Uncertainty in Deep Learning #1.1

How to handle Uncertainty in Deep Learning #1.1

How to handle Uncertainty in Deep Learning #1.2

How to handle Uncertainty in Deep Learning #1.2

How to handle Uncertainty in Deep Learning #2.1

How to handle Uncertainty in Deep Learning #2.1

How to handle Uncertainty in Deep Learning #2.2

How to handle Uncertainty in Deep Learning #2.2

Converting a Tabular Dataset to a Graph Dataset for GNNs

Converting a Tabular Dataset to a Graph Dataset for GNNs

Converting a Tabular Dataset to a Temporal Graph Dataset for GNNs

Converting a Tabular Dataset to a Temporal Graph Dataset for GNNs

How to get started with Data Science (Career tracks and advice)

How to get started with Data Science (Career tracks and advice)

Causality and (Graph) Neural Networks

Causality and (Graph) Neural Networks

Diffusion models from scratch in PyTorch

Diffusion models from scratch in PyTorch

Self-/Unsupervised GNN Training

Self-/Unsupervised GNN Training

Contrastive Learning in PyTorch - Part 1: Introduction

Contrastive Learning in PyTorch - Part 1: Introduction

Contrastive Learning in PyTorch - Part 2: CL on Point Clouds

Contrastive Learning in PyTorch - Part 2: CL on Point Clouds

State of AI 2022 - My Highlights

State of AI 2022 - My Highlights

Equivariant Neural Networks | Part 1/3 - Introduction

Equivariant Neural Networks | Part 1/3 - Introduction

Equivariant Neural Networks | Part 2/3 - Generalized CNNs

Equivariant Neural Networks | Part 2/3 - Generalized CNNs

Equivariant Neural Networks | Part 3/3 - Transformers and GNNs

Equivariant Neural Networks | Part 3/3 - Transformers and GNNs

Personalized Image Generation (using Dreambooth) explained!

Personalized Image Generation (using Dreambooth) explained!

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

Vision Transformer Quick Guide - Theory and Code in (almost) 15 min

LoRA explained (and a bit about precision and quantization)

LoRA explained (and a bit about precision and quantization)

Dimensionality Reduction Techniques | Introduction and Manifold Learning (1/5)

Dimensionality Reduction Techniques | Introduction and Manifold Learning (1/5)

Principal Component Analysis (PCA) | Dimensionality Reduction Techniques (2/5)

Principal Component Analysis (PCA) | Dimensionality Reduction Techniques (2/5)

Multidimensional Scaling (MDS) | Dimensionality Reduction Techniques (3/5)

Multidimensional Scaling (MDS) | Dimensionality Reduction Techniques (3/5)

t-distributed Stochastic Neighbor Embedding (t-SNE) | Dimensionality Reduction Techniques (4/5)

t-distributed Stochastic Neighbor Embedding (t-SNE) | Dimensionality Reduction Techniques (4/5)

Uniform Manifold Approximation and Projection (UMAP) | Dimensionality Reduction Techniques (5/5)

Uniform Manifold Approximation and Projection (UMAP) | Dimensionality Reduction Techniques (5/5)

This video teaches techniques for handling uncertainty in deep learning, including Monte Carlo dropout, deep ensembles, and Bayesian neural networks. It covers concepts like epistemic uncertainty, variational inference, and reparameterization trick, with a focus on estimating uncertainty in model parameters and visualizing confidence bands. By watching this video, viewers can learn how to implement these techniques using tools like pyro, blitz, and torch.

Key Takeaways

Execute a cell to generate a dataset
Create a new section for epistemic uncertainty in a collab notebook
Implement Bayesian neural networks with pyro or blitz
Apply the reparameterization trick for back propagation in Bayesian neural networks
Use variational inference loss for Bayesian neural networks
Sample outside the network using reparameterization trick
Predict parameters of distributions
Calculate mean and standard deviation of predictions
Use sample_elbow function to sample loss function
Combine KL divergence and likelihood with complexity cost weight

💡 The video highlights the importance of estimating uncertainty in model parameters and visualizing confidence bands, and demonstrates how to use techniques like Monte Carlo dropout, deep ensembles, and Bayesian neural networks to achieve this.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Unsupervised Learning

View skill →

How to implement K-Means from scratch with Python

How to implement K-Means from scratch with Python

K-Means Clustering - The Math of Intelligence (Week 3)

K-Means Clustering - The Math of Intelligence (Week 3)

Mean Shift with Titanic Dataset - Practical Machine Learning Tutorial with Python p.40

Mean Shift with Titanic Dataset - Practical Machine Learning Tutorial with Python p.40

Self-/Unsupervised GNN Training

Self-/Unsupervised GNN Training

Statistical Learning: 12.R.3 Hierarchical Clustering

Statistical Learning: 12.R.3 Hierarchical Clustering

Stanford Online

Clustering with DBSCAN, Clearly Explained!!!

Clustering with DBSCAN, Clearly Explained!!!

StatQuest with Josh Starmer

Related AI Lessons

Want to get started with deep learning

Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch

Reddit r/deeplearning

Building a Deepfake Detector From Scratch — What Nobody Tells You

Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media

Medium · Deep Learning

Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…

Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance

Medium · Deep Learning

Implementing Neural Style Transfer from Scratch: The Project That Started It All

Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning

Medium · Deep Learning

Chapters (6)

Introduction

0:23 Notebook execution notice

1:50 Bayesian Neural Network

8:00 Monte Carlo Dropout

10:15 Deep Ensemble

13:02 Summary

Image Classification with ml5.js

The Coding Train