How to handle Uncertainty in Deep Learning #2.2

DeepFindr · Beginner ·🧬 Deep Learning ·4y ago

Key Takeaways

The video demonstrates techniques for handling uncertainty in deep learning, including Monte Carlo dropout, deep ensembles, and Bayesian neural networks, using tools such as pyro, blitz, and torch. It covers concepts like epistemic uncertainty, variational inference, and reparameterization trick, with a focus on estimating uncertainty in model parameters and visualizing confidence bands.

Full Transcript

welcome back to this last and final part of the uncertainty quantification series in this video we want to have a look at how we can implement the methods for estimating epistemic uncertainty that we've discussed in the last video in particular this refers to basic neural networks monte carlo dropout and deep ensembles so this is the collab notebook that we've used last time for estimating alia torrig uncertainty with these different methods and i've created a new section for epistemic uncertainty with a note so in order to run this section you first have to execute this first cell which generates the data set that we've used last time so just as a quick reminder we have a train set that is distributed like this and we also have a test set that has a larger input range so this one is from minus seven to plus seven and this one from -10 to plus ten and what we expect now for the epistemic models is that they report in high uncertainty in this area because this is data that the model has never seen before so we will use the same data set and because of that first execute this cell and then you can jump down here and continue with this epistemic block so here we have three very basic implementations for basic neural nets monte carlo dropout and deep ensembles and let me emphasize that i didn't invest much time into hyper parameter tuning so i'm pretty sure that you can get much more out of these models but this is just a very basic variant how you can implement them to make them report epistemic uncertainty for the predictions [Music] so let's have a look at bayesian neural nets of course the question is how do you implement that because you have to put a distribution on the weights and also support back propagation with this re-parametrization trick and there are different choices for bnns you can use libraries like pyro which are probabilistic libraries but there's also a very simple library called blitz that i found on github and blitz supports these bayesian layers and all of them basically allow you to have a distribution on the weights and also support the full pi torch back propagation process so we will quickly install blitz which is called blitz bayesian pi torch on pip and we will also import some of the previous modules we used and now let's jump right into the model so the way how this works now is that we simply use the layers from blitz.modules and build our simple network with it in this case we use bayesian linear layers that are simply fully connected layers and here we have one single input feature which is our x value and we output one single prediction and with that we can build our network and there's nothing different from a regular network now in order to make this work we need to put a decorator to this module which is called variational estimator and this also comes from blitz and with this we are able to calculate things like the loss so this variational inference loss so by adding this we basically make it a bayesian network now we can also have a look at this network by simply printing it and we see it's just like before those three linear layers but we also see that inside of these layers we have some additional things now usually you would only have your weights in this case we have two distributions for the weight and two distributions for the bias and on one hand we have the prior distributions which model our prior knowledge and are probably just simple gaussian distributions and then we have these trainable random distributions and in order to make those distributions trainable we need to apply the repair metrization trick and this means we sample outside of the network which is all handled by blitz and we simply predict the parameters of this these distributions in order to apply this variational inference just like before i also have a function that plots the predictions and in a bayesian neural net the way it works is that we just predict several times and here we have a sample size of 100 which means we we iterate over the samples and for each sample we give it the same input and this gives us a list of predictions and what we can do now is calculate the mean and standard deviation on these predictions and this gives us these confidence bands which we can visualize with a little bit of plotting down here because we previously added this decorator to our model we now have access to some additional functions and one of them is for example called sample elbow the idea is that we want to do sampling based variational inference here which means we want to approximate the posterior distribution and do this by sampling a loss and with this function we can sample the loss function for specific inputs now the loss function in our case is the evidence lower bound that i've explained in the last video it consists of two parts one is the likelihood which is our actual criterion so in this case we use just the mean squared arrow and the other one is the kl divergence that checks how close we are to the prior distribution and with those two we can calculate the loss and there is also a different function called sample elbow explicit i think and with this one we can actually get the individual losses for kl divergence and the likelihood in this case both losses are combined according to this complexity cost weight so this sort of weighs the kl divergence against our target criterion so long story short we simply use this sample elbow function on our model and pass it the inputs the labels the criterion how many times we want to sample and this weight for our cost between kl divergence and likelihoods and i have a loop here that runs over 100 epochs and every 10 epochs i have a test loop that simply tests the predictions and plots them just like we had it before now let's have a look at these predictions so this is the result after 90 epochs of training and we see that the uncertainty outside of our distribution so on the left of this bar and on the right of this one is quite high compared to the uncertainty in the middle what makes me wonder a bit is why this area and this area also seem to have a high uncertainty because the model has seen a lot of data points and therefore i think the uncertainty should also be lower so i'm not 100 satisfied with this result but this is what i got after a couple of hours of tweaking parameters [Music] so the next method we want to have a look at is monte carlo dropouts and this one is actually pretty straightforward to implement the only thing we need is a dropout layer and we can use this dropout layer several times because it's not specific to one of these layers and here i selected a dropout rate of 0.2 i read that in the paper they also use higher dropout rates like 0.5 but i found lower values to work better in my case and just like before i also have a plotting function now and the important part here is that we use the model in training mode because typically if you put your model into test mode by calling model.eval you turn off dropout and in this case we want to use dropout also for our predictions and just like before we sample several times in this case 100 and each time we get a different dropout set and because of that we get some variation on the parameters and using that we can get an estimate of the uncertainty in our parameters and we use the means and standard deviations of this sampled output distribution and plot them just like before and nothing new here we train a model in a train and test loop and print the results and we can see after some yeah after almost 100 iterations we get this picture where we see that the uncertainty in the middle is again lower than the uncertainty outside of the of the training distribution again this is not perfect but i think it shows a tendency that we have a higher uncertainty outside of the distribution and also for specific areas of the inputs but a very high certainty in in this middle area compared to debts [Music] now the last model on today's agenda are deep ensembles and those are also straightforward to implement basically it's just an ensemble of different networks and for that i've created a simple network that i've also used before and this network actually predicts mu and sigma or variance and this means we can use this model to predict aliatoric uncertainty and use the the ensemble to predict epistemic uncertainty and the idea now is that we have several of these models and for each of these models i now get the predictions and eventually we use the variance in all of these predictions and again i can use them to plot these sort of confidence bands now how does this look like in practice we just define a number of models and here i simply stack these models according to how many models i want to have and for each of these models we can use the same loss function but we have to consider that we need to use different optimizers because each optimizer uses the parameters of one specific model and in the training loop we can now simply iterate over this deep ensemble and for each of the models we can get the predictions and we simply optimize each model individually but then report a common loss for the whole ensemble and yeah in this test loop we simply plot the results and again we iterate over the models i have to say that i've also seen different approaches that might do it more efficiently but this is the most straightforward way to implement it and the results of that look like this here we have a similar picture as before we have a high uncertainty outside of the distribution and the lower one in the middle of the distribution and again we have some ranges where the model also reports high uncertainty again i still don't know why this happens i would need to further investigate this but as this was a very simple tutorial with this dummy data set i didn't go further into detail but generally it shows that we can certainly capture especially out of distribution uncertainty and also uh the areas where our model is quite certain so that's all for this uncertainty in deep learning series i hope that you found it interesting or helpful also feel free to leave a comment if you have any questions for the models or for the implementation and i think that i will do some more videos in that direction in the future because i find uncertainty in deep learning a very interesting and important topic and let me know what you think about it and i see you soon in a future video [Music]

Original Description

▬▬ Code ▬▬▬▬▬ Colab Notebook: https://colab.research.google.com/drive/1AE7g0BDQDcCDx8nzB8dRcm_VyonqhaWG?usp=sharing ▬▬ Used Music ▬▬▬▬▬▬▬▬▬▬▬ Music from Uppbeat (free for Creators!): https://uppbeat.io/t/pryces/lateflights License code: 3O8NFX8WUHJBR2SB ▬▬ Used Videos ▬▬▬▬▬▬▬▬▬▬▬ Clouds, Kelly L from Pexels ▬▬ Timestamps ▬▬▬▬▬▬▬▬▬▬▬ 00:00 Introduction 00:23 Notebook execution notice 01:50 Bayesian Neural Network 08:00 Monte Carlo Dropout 10:15 Deep Ensemble 13:02 Summary ▬▬ Support me if you like 🌟 ►Link to this channel: https://bit.ly/3zEqL1W ►Support me on Patreon: https://bit.ly/2Wed242 ►Buy me a coffee on Ko-Fi: https://bit.ly/3kJYEdl ►E-Mail: deepfindr@gmail.com ▬▬ My equipment 💻 - Microphone: https://amzn.to/3DVqB8H - Microphone mount: https://amzn.to/3BWUcOJ - Monitors: https://amzn.to/3G2Jjgr - Monitor mount: https://amzn.to/3AWGIAY - Height-adjustable table: https://amzn.to/3aUysXC - Ergonomic chair: https://amzn.to/3phQg7r - PC case: https://amzn.to/3jdlI2Y - GPU: https://amzn.to/3AWyzwy - Keyboard: https://amzn.to/2XskWHP - Bluelight filter glasses: https://amzn.to/3pj0fK2
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DeepFindr · DeepFindr · 36 of 56

1 Understanding Graph Neural Networks | Part 1/3 - Introduction
Understanding Graph Neural Networks | Part 1/3 - Introduction
DeepFindr
2 Understanding Graph Neural Networks | Part 2/3 - GNNs and it's Variants
Understanding Graph Neural Networks | Part 2/3 - GNNs and it's Variants
DeepFindr
3 Understanding Graph Neural Networks | Part 3/3 - Pytorch Geometric and Molecule Data using RDKit
Understanding Graph Neural Networks | Part 3/3 - Pytorch Geometric and Molecule Data using RDKit
DeepFindr
4 Node Classification on Knowledge Graphs using PyTorch Geometric
Node Classification on Knowledge Graphs using PyTorch Geometric
DeepFindr
5 Understanding Convolutional Neural Networks | Part 1 / 3 - The Basics
Understanding Convolutional Neural Networks | Part 1 / 3 - The Basics
DeepFindr
6 Understanding Convolutional Neural Networks | Part 2 / 3 - Wonders of the world CNN with PyTorch
Understanding Convolutional Neural Networks | Part 2 / 3 - Wonders of the world CNN with PyTorch
DeepFindr
7 Understanding Convolutional Neural Networks | Part 3 / 3 - Transfer Learning and Explainable AI
Understanding Convolutional Neural Networks | Part 3 / 3 - Transfer Learning and Explainable AI
DeepFindr
8 How to use edge features in Graph Neural Networks (and PyTorch Geometric)
How to use edge features in Graph Neural Networks (and PyTorch Geometric)
DeepFindr
9 Explainable AI explained! | #1 Introduction
Explainable AI explained! | #1 Introduction
DeepFindr
10 Explainable AI explained! | #2 By-design interpretable models with Microsofts InterpretML
Explainable AI explained! | #2 By-design interpretable models with Microsofts InterpretML
DeepFindr
11 Explainable AI explained! | #3 LIME
Explainable AI explained! | #3 LIME
DeepFindr
12 Explainable AI explained! | #4 SHAP
Explainable AI explained! | #4 SHAP
DeepFindr
13 Explainable AI explained! | #5 Counterfactual explanations and adversarial attacks
Explainable AI explained! | #5 Counterfactual explanations and adversarial attacks
DeepFindr
14 Explainable AI explained! | #6 Layerwise Relevance Propagation with MRI data
Explainable AI explained! | #6 Layerwise Relevance Propagation with MRI data
DeepFindr
15 Understanding Graph Attention Networks
Understanding Graph Attention Networks
DeepFindr
16 GNN Project #1 - Introduction to HIV dataset
GNN Project #1 - Introduction to HIV dataset
DeepFindr
17 GNN Project #2 - Creating a Custom Dataset in Pytorch Geometric
GNN Project #2 - Creating a Custom Dataset in Pytorch Geometric
DeepFindr
18 GNN Project #3.2 - Graph Transformer
GNN Project #3.2 - Graph Transformer
DeepFindr
19 GNN Project #4.1 - Graph Variational Autoencoders
GNN Project #4.1 - Graph Variational Autoencoders
DeepFindr
20 GNN Project #4.2 - GVAE Training and Adjacency reconstruction
GNN Project #4.2 - GVAE Training and Adjacency reconstruction
DeepFindr
21 GNN Project #4.3 - One-shot molecule generation - Part 1
GNN Project #4.3 - One-shot molecule generation - Part 1
DeepFindr
22 GNN Project #4.3 - Code explanation
GNN Project #4.3 - Code explanation
DeepFindr
23 Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 1/2
Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 1/2
DeepFindr
24 Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 2/2
Machine Learning Model Deployment with Python (Streamlit + MLflow) | Part 2/2
DeepFindr
25 How to explain Graph Neural Networks (with XAI)
How to explain Graph Neural Networks (with XAI)
DeepFindr
26 Explaining Twitch Predictions with GNNExplainer
Explaining Twitch Predictions with GNNExplainer
DeepFindr
27 Python Graph Neural Network Libraries (an Overview)
Python Graph Neural Network Libraries (an Overview)
DeepFindr
28 Friendly Introduction to Temporal Graph Neural Networks (and some Traffic Forecasting)
Friendly Introduction to Temporal Graph Neural Networks (and some Traffic Forecasting)
DeepFindr
29 Traffic Forecasting with Pytorch Geometric Temporal
Traffic Forecasting with Pytorch Geometric Temporal
DeepFindr
30 Fraud Detection with Graph Neural Networks
Fraud Detection with Graph Neural Networks
DeepFindr
31 Fake News Detection using Graphs with Pytorch Geometric
Fake News Detection using Graphs with Pytorch Geometric
DeepFindr
32 Recommender Systems using Graph Neural Networks
Recommender Systems using Graph Neural Networks
DeepFindr
33 How to handle Uncertainty in Deep Learning #1.1
How to handle Uncertainty in Deep Learning #1.1
DeepFindr
34 How to handle Uncertainty in Deep Learning #1.2
How to handle Uncertainty in Deep Learning #1.2
DeepFindr
35 How to handle Uncertainty in Deep Learning #2.1
How to handle Uncertainty in Deep Learning #2.1
DeepFindr
How to handle Uncertainty in Deep Learning #2.2
How to handle Uncertainty in Deep Learning #2.2
DeepFindr
37 Converting a Tabular Dataset to a Graph Dataset for GNNs
Converting a Tabular Dataset to a Graph Dataset for GNNs
DeepFindr
38 Converting a Tabular Dataset to a Temporal Graph Dataset for GNNs
Converting a Tabular Dataset to a Temporal Graph Dataset for GNNs
DeepFindr
39 How to get started with Data Science (Career tracks and advice)
How to get started with Data Science (Career tracks and advice)
DeepFindr
40 Causality and (Graph) Neural Networks
Causality and (Graph) Neural Networks
DeepFindr
41 Diffusion models from scratch in PyTorch
Diffusion models from scratch in PyTorch
DeepFindr
42 Self-/Unsupervised GNN Training
Self-/Unsupervised GNN Training
DeepFindr
43 Contrastive Learning in PyTorch - Part 1: Introduction
Contrastive Learning in PyTorch - Part 1: Introduction
DeepFindr
44 Contrastive Learning in PyTorch - Part 2: CL on Point Clouds
Contrastive Learning in PyTorch - Part 2: CL on Point Clouds
DeepFindr
45 State of AI 2022 - My Highlights
State of AI 2022 - My Highlights
DeepFindr
46 Equivariant Neural Networks | Part 1/3 - Introduction
Equivariant Neural Networks | Part 1/3 - Introduction
DeepFindr
47 Equivariant Neural Networks | Part 2/3 - Generalized CNNs
Equivariant Neural Networks | Part 2/3 - Generalized CNNs
DeepFindr
48 Equivariant Neural Networks | Part 3/3 - Transformers and GNNs
Equivariant Neural Networks | Part 3/3 - Transformers and GNNs
DeepFindr
49 Personalized Image Generation (using Dreambooth) explained!
Personalized Image Generation (using Dreambooth) explained!
DeepFindr
50 Vision Transformer Quick Guide - Theory and Code in (almost) 15 min
Vision Transformer Quick Guide - Theory and Code in (almost) 15 min
DeepFindr
51 LoRA explained (and a bit about precision and quantization)
LoRA explained (and a bit about precision and quantization)
DeepFindr
52 Dimensionality Reduction Techniques | Introduction and Manifold Learning (1/5)
Dimensionality Reduction Techniques | Introduction and Manifold Learning (1/5)
DeepFindr
53 Principal Component Analysis (PCA) | Dimensionality Reduction Techniques  (2/5)
Principal Component Analysis (PCA) | Dimensionality Reduction Techniques (2/5)
DeepFindr
54 Multidimensional Scaling (MDS) | Dimensionality Reduction Techniques  (3/5)
Multidimensional Scaling (MDS) | Dimensionality Reduction Techniques (3/5)
DeepFindr
55 t-distributed Stochastic Neighbor Embedding (t-SNE) | Dimensionality Reduction Techniques  (4/5)
t-distributed Stochastic Neighbor Embedding (t-SNE) | Dimensionality Reduction Techniques (4/5)
DeepFindr
56 Uniform Manifold Approximation and Projection (UMAP) |  Dimensionality Reduction Techniques (5/5)
Uniform Manifold Approximation and Projection (UMAP) | Dimensionality Reduction Techniques (5/5)
DeepFindr

This video teaches techniques for handling uncertainty in deep learning, including Monte Carlo dropout, deep ensembles, and Bayesian neural networks. It covers concepts like epistemic uncertainty, variational inference, and reparameterization trick, with a focus on estimating uncertainty in model parameters and visualizing confidence bands. By watching this video, viewers can learn how to implement these techniques using tools like pyro, blitz, and torch.

Key Takeaways
  1. Execute a cell to generate a dataset
  2. Create a new section for epistemic uncertainty in a collab notebook
  3. Implement Bayesian neural networks with pyro or blitz
  4. Apply the reparameterization trick for back propagation in Bayesian neural networks
  5. Use variational inference loss for Bayesian neural networks
  6. Sample outside the network using reparameterization trick
  7. Predict parameters of distributions
  8. Calculate mean and standard deviation of predictions
  9. Use sample_elbow function to sample loss function
  10. Combine KL divergence and likelihood with complexity cost weight
💡 The video highlights the importance of estimating uncertainty in model parameters and visualizing confidence bands, and demonstrates how to use techniques like Monte Carlo dropout, deep ensembles, and Bayesian neural networks to achieve this.

Related AI Lessons

Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning

Chapters (6)

Introduction
0:23 Notebook execution notice
1:50 Bayesian Neural Network
8:00 Monte Carlo Dropout
10:15 Deep Ensemble
13:02 Summary
Up next
Image Classification with ml5.js
The Coding Train
Watch →