ResNet - Explained!

CodeEmporium · Beginner ·📐 ML Fundamentals ·8mo ago

Skills: CV Basics80%ML Pipelines60%Supervised Learning50%

Key Takeaways

The video explains the ResNet network, its advantages over shallower networks, and its implementation in code, using resources such as the ResNet paper and code on GitHub.

Full Transcript

Greetings fellow learners. In this video we are going to talk about ResNet. The what, the why, and the how. So what is ResNet? It is a network that makes use of residual or skip connections. That's these connections over here. So why and how do we use it? Well, in order to understand this, let's actually talk about the object recognition pipeline and builds out the logic of how we can even get to ResNet that we see. So, first we have the object recognition pipeline that takes in an image and will determine like what the classification of that image is. Now, in 2012, we had AlexNet which was the state-of-the-art for object recognition. It was basically a network with a sequence of convolution activation and pooling layers along with some feed forward layers in order to you know map an image to an object category. And over the years in order to make this more performant a few more architectures were introduced. For example, in 2014, we had VGNet, which was a suite of architectures that used smaller 3 +3 convolutions and stacks of them in order to make the network deeper and hence more performant than alexnet. And around the same time we also had the inception architecture where we had again a network that was much deeper and wider in order to simulate sparse connections and hence used pointwise or one cross one convolutions. This too was more performant than alexnet. And so kind of the general consensus around this time was that deeper networks can increase performance. Now knowing this, researchers at Microsoft thought like what happens if we still go deeper. At this point, let's take a look at some code to actually code this out and see what's happening. So in my collab notebook over here, I have like six models and we're going to go through like the first four right now. The first model being just a basic Alex net where it's a sequence of convolution activations and max pullings. And each of these convolutions can be of different sizes. Some are 7 + 7 and some are 5 cross 5. And kind of training this model, you'll see that you'll get like a final accuracy of let's say 64% on this like cipher 10 data set for image classification. Now what happens now if we you know use some form of VGNET which basically means we'll replace the larger 7 + 7 5 + 5 convolutions with stacks of just 3 +3 convolutions and in doing so you're going to find out that well the performance is actually well the network isn't even learning in this case. So now we have like a very deep network with learn network stop training. And the reason for this is actually because of the vanishing gradient problem. And I can kind of prove this out in code here because you can see that like in the later layers we have gradients that are much larger whereas you know in the earliest layers we have gradients of the order of 10 to the^ of -5. And this means that there's an order of magnitude of hundred or even like thousand times smaller updates in the beginning of the network compared to the end of the network. And hence the network does not learn and hence we're running into this issue. So to solve this vanishing gradient problem, what we can do is just add bashn normalization over here between the layers. So I add like batch normalization after every single convolution layer and using the exact same network just adding batch normalization after the convolution layers. And so with batch normalization you see we get like the most performant architecture right here with 80% accuracy. And if you kind of look at like the the gradient flow if you just pay attention to the weights over here you can see that it's much more palatable. It's like 10 to the^ of -2 instead of 10 the^ of neg5 what it used to be. This is also more comparable to like what we see throughout over in these like later or layers of the network too. So no vanishing gradient problem and the network we can see that it's reflected in its performance. So what happens now if we try to go even deeper adding more convolution and activation layers for example. Well, we have a model 4 over here where we did just that. We took the exact same network as before, but we added like 10 convolution activation bash normalization and activation layers in sequence. And when we train this, well, what we notice here is that the network is definitely training as the accuracy does get better, but the accuracy is not better than the previous shallower network case. And we also notice here that the train loss or the train accuracy while it does get better, it is much slower at getting better. Same with this validation accuracy. While it is better, it is much slower at getting better than let's say the shallower counterpart. So this was like 29, 49, and 60 for the first three epochs. If you scroll over here, it's like 44, 63, 70. So it's much quicker in learning this shallower network over here. So now like what why does this actually happen? Well, you can also see here that it's not really a problem of the vanishing gradients because you know the gradients here you know they're still quite healthily active right 10 the^ of negative 1 10 the^ of -2 which seems par for the course for you know compared to like all these other cases over here that we see as well. So we don't have like a vanishing gradient problem but training is much slower and testing is also much slower and this is what we call performance degradation. So let's now see in the theory of like what this is why it occurs and then we'll also see how ResNet can potentially solve this issue. So performance degradation what is it? It is a phenomenon for a deeper network where the training and testing error is worse than its shallower counterpart. This is exactly what we saw in code. Now why does performance degradation happen? To illustrate this, let's actually take a simple example. I'm going to take two blocks of convolution backs normalization and activation right over here and here. And what we're going to do is we're going to train a image classifier, an object recognizer. So it'll take this image and output, you know, an object category. Let's assume that this small network over here is powerful enough to capture all the nonlinearities that are required to map the image to this output category. And so you can just imagine that let's assume that this like network um performance is like 95% or something and we can't really go much higher than that. Now let's consider another network. Let's say we made it deeper by doubling the number of layers to four layers. Now in theory a deeper network should be at least as performance as a shallow network as deeper layers should be able to learn a pass through or identity function. So you can imagine that you know this these layers will have the exact same um parameters for example and then this would just be like if there's a tensor here it'll map it to the exact same tensor here effectively mimicking this network here and so it should be the same but in practice it's not. So in theory, what we're trying to say is like here, that's kind of what we wrote out. We have like a tensor. This should be mapped to the exact same tensor. But in practice, it's actually mapped to a slightly distorted tensor. And why this happens is because this network or these networks learn through back propagation. So there's like an estimation technique of updating these weights or these configuration parameters little by little and in doing so kind of finding like this set of parameters the configuration set of parameters here that maps like you know an like a tensor to the exact same tensor over here. It is a very specific solution that is very difficult for our optimizer to typically find. And because it's so difficult to find, what you're going to end up with is instead a slightly distorted tensor. And you can imagine well as you get deeper and deeper you know if you have multiple layers like this that mid tensors here might be slightly distorted but the distortions can keep adding up to a point that you get a very distorted tensor over here and this tensor when you know you make an output you'll see that you might even get a wrong object category classification. So we effectively have a deeper network that has lower performance over here and this is basically performance degradation and hence also why it occurs. So now that we know why it can occur, how do you really solve this problem? Well, researchers at Microsoft thought here that well, why don't we modify the network structure to make it easier to emulate a pass through or identity function and they did this using skip connections. So essentially in the original case where we you know we have this tensor and we get like a slightly distorted tensor over here. What if we now just create this residual or skip connection over here? So basically we're going to like take the activation from here and then you know perform like a re activation only after we take the sum of these two arms over here. Now this is great because now it is far easier to simulate a pass through as the last convolution or batch normalization can just learn to be zero. So this batchalization basically can learn to output zeros over here or this convolution can have filters such that you know the output becomes just zero on applying the convolution and in either case you're going to get like zero for this arm and the tensor is just going to be essentially a pass through for this arm. So it's far easier to actually get a very similar or the same tensor itself after this operation. And now this residual arm, the residual arm is like this connection over here. This arm is now only going to really be learning anything if it will if it is like beneficial to the network. So if this part is just not, you know, capturing all the nonlinearities in order to perform the object classification or the image classification, it's only the extra information that will be learned here. But if it is powerful enough then this would almost simulate like a pass through. And so what we can do is repeat these skip connections throughout the network as we see here. And that's it with skip connections. Thus it is trivial to model a pass through function and hence a network of any depth can at least be as performant as its shallower counterpart. So let's now go back to our code and see this in action. So now we have our fifth model where we're creating ResNet and we're going to add skip connections to our previous like VGNet architecture over here where it has like a we have like a bunch of convolution uh batchalization activation right here. So when we do that and we do take the exact same network and we're just going to add like residual connections which I'll I'm just denoting by residuals over here. If we do this and I'll share the code later so you can see exactly how it's coded out. You'll see you get like the performance that's as good as well better than any other performance that we had seen previously. So we got now rid of this performance or this training degradation issue. So you can see like these training values are much much higher now. And at the same time, it also has the ability to to mitigate any other like gradients that vanish as well. So it also has like an added benefit there. Now, as an added bonus, let's say we go even deeper than this. So honestly, all I did was add even more layers to this. So I added like 10 more convolution block layers to this, and I just wanted to see what's going to happen. Well, if you do this and you, you know, you train it, you'll see that you'll still get really good performance that's maintained with very minimal performance or training degradation. And so I'm going to share all of this code in the description below. So feel free to play around with it. And I hope everything here made sense. Quiz time. Have you been paying attention? Let's quiz you to find out. Why use residual connections? A. To avoid the dying red loop problem. B, so performance of deeper networks can at least match the shallower counterparts. C, to avoid performance degradation, or D, to mitigate vanishing gradients. Multiple options may be correct here, and I'll give you a few seconds to answer this question. The correct options are B, C, and D. Did you get them right? Comment your reasoning down in the comments below and let's have a discussion. And at this point, if you do think I deserve it, please do consider giving this video a like because it will help me out a lot. That's going to do it for quiz time and for this video. But before we go, let's generate a summary. So in this video, we took a look at the what, why, and how of FresNet. So we started with the definition that it is a network that makes use of residual or skip connections. Then we also understood that you know through VGNet and through inception deeper networks can increase performance. But what happens if you go even deeper than that? Well the issue that we run into is performance degradation. It's a phenomena where a deeper network for a deeper network the training and test error is worse than its shallower counterpart. We also took a look at how how and why this happens. Because of the nature of optimization during back propagation, it is very difficult for this chunk of network to actually represent a pass through function. And these distortions add up over time to a point where we might even get worse predictions and hence performance degrades. To solve this, what we can do is we add a residual connection or a skip connection over here. And this will allow you know only this part of the network will actually learn something if it is beneficial to the network or otherwise it'll just simulate a pass through if you know the network already is powerful enough to capture the mapping between this image and the object category and hence we add them throughout the network. And so with skip connections, it is trivial to model a pass through function and hence a network of any depth can be at least as performant as its shallower counterpart. We also took a look at a few models to demonstrate that this is the case too in practice. And that's all that I have for you today. If you think I deserve it, please do consider giving this video a like. All resources are going to be down in the description below for the code, the paper, the slides. So do check them out. To continue your AI journey, do click on this video right over here. And I will see you in the next one.

Original Description

In this video, we take a look the ResNet network. What is it? Why is it better than some of the shallower networks that came before it? How do we implement this in code? ABOUT ME ⭕ Subscribe: https://www.youtube.com/c/CodeEmporium?sub_confirmation=1 📚 Medium Blog: https://medium.com/@dataemporium 💻 Github: https://github.com/ajhalthor 👔 LinkedIn: https://www.linkedin.com/in/ajay-halthor-477974bb/ RESOURCES [1 📚] Slides used in the video: https://link.excalidraw.com/p/readonly/Oj623wJMmvUZxfF5dyXl [2 📚] Main paper of the video: https://arxiv.org/pdf/1512.03385 [3 📚] Code for ResNet network: https://github.com/ajhalthor/computer-vision-101 PLAYLISTS FROM MY CHANNEL ⭕ Reinforcement Learning: https://youtube.com/playlist?list=PLTl9hO2Oobd9kS--NgVz0EPNyEmygV1Ha&si=AuThDZJwG19cgTA8 Natural Language Processing: https://youtube.com/playlist?list=PLTl9hO2Oobd_bzXUpzKMKA3liq2kj6LfE&si=LsVy8RDPu8jeO-cc ⭕ Transformers from Scratch: https://youtube.com/playlist?list=PLTl9hO2Oobd_bzXUpzKMKA3liq2kj6LfE ⭕ ChatGPT Playlist: https://youtube.com/playlist?list=PLTl9hO2Oobd9coYT6XsTraTBo4pL1j4HJ ⭕ Convolutional Neural Networks: https://youtube.com/playlist?list=PLTl9hO2Oobd9U0XHz62Lw6EgIMkQpfz74 ⭕ The Math You Should Know : https://youtube.com/playlist?list=PLTl9hO2Oobd-_5sGLnbgE8Poer1Xjzz4h ⭕ Probability Theory for Machine Learning: https://youtube.com/playlist?list=PLTl9hO2Oobd9bPcq0fj91Jgk_-h1H_W3V ⭕ Coding Machine Learning: https://youtube.com/playlist?list=PLTl9hO2Oobd82vcsOnvCNzxrZOlrz3RiD MATH COURSES (7 day free trial) 📕 Mathematics for Machine Learning: https://imp.i384100.net/MathML 📕 Calculus: https://imp.i384100.net/Calculus 📕 Statistics for Data Science: https://imp.i384100.net/AdvancedStatistics 📕 Bayesian Statistics: https://imp.i384100.net/BayesianStatistics 📕 Linear Algebra: https://imp.i384100.net/LinearAlgebra 📕 Probability: https://imp.i384100.net/Probability OTHER RELATED COURSES (7 day free trial) 📕 ⭐ Deep Learning Specialization: https://imp.i

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from CodeEmporium · CodeEmporium · 0 of 60

← Previous Next →

Linear Regression and Multiple Regression

Linear Regression and Multiple Regression

Logistic Regression - THE MATH YOU SHOULD KNOW!

Logistic Regression - THE MATH YOU SHOULD KNOW!

Generative Adversarial Networks - FUTURISTIC & FUN AI !

Generative Adversarial Networks - FUTURISTIC & FUN AI !

Deep Learning on the Cloud - GPU TO LEARN FASTER

Deep Learning on the Cloud - GPU TO LEARN FASTER

Deep Mind's AlphaGo Zero - EXPLAINED

Deep Mind's AlphaGo Zero - EXPLAINED

Mask Region based Convolution Neural Networks - EXPLAINED!

Mask Region based Convolution Neural Networks - EXPLAINED!

Attention in Neural Networks

Attention in Neural Networks

Depthwise Separable Convolution - A FASTER CONVOLUTION!

Depthwise Separable Convolution - A FASTER CONVOLUTION!

One Neural network learns EVERYTHING ?!

One Neural network learns EVERYTHING ?!

Neural Voice Cloning

Neural Voice Cloning

AI creates Image Classifiers…by DRAWING?

AI creates Image Classifiers…by DRAWING?

Unpaired Image-Image Translation using CycleGANs

Unpaired Image-Image Translation using CycleGANs

K-Means Clustering - EXPLAINED!

K-Means Clustering - EXPLAINED!

Random Forest Classification

Random Forest Classification

Data Science in Finance

Data Science in Finance

Hypothesis testing with Applications in Data Science

Hypothesis testing with Applications in Data Science

A/B Testing - Simply Explained

A/B Testing - Simply Explained

The Kernel Trick - THE MATH YOU SHOULD KNOW!

The Kernel Trick - THE MATH YOU SHOULD KNOW!

Support Vector Machines - THE MATH YOU SHOULD KNOW

Support Vector Machines - THE MATH YOU SHOULD KNOW

Principal Component Analysis (PCA) - THE MATH YOU SHOULD KNOW!

Principal Component Analysis (PCA) - THE MATH YOU SHOULD KNOW!

History of Calculus - Animated

History of Calculus - Animated

Curiosity in AI

Curiosity in AI

DropBlock - A BETTER DROPOUT for Neural Networks

DropBlock - A BETTER DROPOUT for Neural Networks

Autoencoders - EXPLAINED

Autoencoders - EXPLAINED

Recurrent Neural Networks - EXPLAINED!

Recurrent Neural Networks - EXPLAINED!

LSTM Networks - EXPLAINED!

LSTM Networks - EXPLAINED!

Building an Image Captioner with Neural Networks

Building an Image Captioner with Neural Networks

10 Machine Learning Questions - ANSWERED!

10 Machine Learning Questions - ANSWERED!

How do neural networks work?

How do neural networks work?

Evolution of Face Generation | Evolution of GANs

Evolution of Face Generation | Evolution of GANs

How does Google Translate's AI work?

How does Google Translate's AI work?

How to keep up with AI research?

How to keep up with AI research?

How does YouTube recommend videos? - AI EXPLAINED!

How does YouTube recommend videos? - AI EXPLAINED!

Variational Autoencoders - EXPLAINED!

Variational Autoencoders - EXPLAINED!

Logistic Regression - VISUALIZED!

Logistic Regression - VISUALIZED!

Gradient Descent - THE MATH YOU SHOULD KNOW

Gradient Descent - THE MATH YOU SHOULD KNOW

Boosting - EXPLAINED!

Boosting - EXPLAINED!

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

Transformer Neural Networks - EXPLAINED! (Attention is all you need)

Loss Functions - EXPLAINED!

Loss Functions - EXPLAINED!

Optimizers - EXPLAINED!

Optimizers - EXPLAINED!

NLP with Neural Networks & Transformers

NLP with Neural Networks & Transformers

Batch Normalization - EXPLAINED!

Batch Normalization - EXPLAINED!

Activation Functions - EXPLAINED!

Activation Functions - EXPLAINED!

Data Scientist Answers Interview Questions

Data Scientist Answers Interview Questions

Why use GPU with Neural Networks?

Why use GPU with Neural Networks?

How do GPUs speed up Neural Network training?

How do GPUs speed up Neural Network training?

BERT Neural Network - EXPLAINED!

BERT Neural Network - EXPLAINED!

ConvNets Scaled Efficiently

ConvNets Scaled Efficiently

Transformer Neural Net makes music! (JukeboxAI)

Transformer Neural Net makes music! (JukeboxAI)

What do filters of Convolution Neural Network learn?

What do filters of Convolution Neural Network learn?

We're hosting a Machine Learning Conference!

We're hosting a Machine Learning Conference!

MLconfEU 2020: Machine Learning Conference for Software Engineers

MLconfEU 2020: Machine Learning Conference for Software Engineers

Are Neural Networks Intelligent?

Are Neural Networks Intelligent?

Time Series Forecasting with Machine Learning

Time Series Forecasting with Machine Learning

Few Shot Learning - EXPLAINED!

Few Shot Learning - EXPLAINED!

How does a Data Scientist Fight FRAUD?

How does a Data Scientist Fight FRAUD?

How would a Data Scientist analyze Customer Churn?

How would a Data Scientist analyze Customer Churn?

Expectations with Machine Learning

Expectations with Machine Learning

Why Logistic Regression DOESN'T return probabilities?!

Why Logistic Regression DOESN'T return probabilities?!

How you SHOULD code Machine Learning

How you SHOULD code Machine Learning

This video explains the ResNet network, a deep learning architecture used for image classification, and provides resources for implementation in code. Viewers can learn how to implement ResNet and understand its advantages over shallower networks.

Key Takeaways

Learn about the ResNet network and its architecture
Understand the advantages of ResNet over shallower networks
Implement ResNet in code using Python and GitHub resources
Train and test the ResNet model for image classification

💡 The ResNet network is a deep learning architecture that uses residual connections to ease the training process and improve the performance of the model.

🔒 Pro feature: Ask AI to explain this lesson →

More on: CV Basics

View skill →

Identify Horses or Humans with TensorFlow and Vertex AI

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Apply OpenGL Texturing and Camera Systems

Apply OpenGL Texturing and Camera Systems

Aerial Image Segmentation with PyTorch

Aerial Image Segmentation with PyTorch

How to Install Stable Diffusion - automatic1111

How to Install Stable Diffusion - automatic1111

Sebastian Kamph

NVIDIA RTXGI Unreal Engine 4 Plugin: Introduction and Setup

NVIDIA RTXGI Unreal Engine 4 Plugin: Introduction and Setup

NVIDIA Developer

Related AI Lessons

Data Preprocessing: Encoding and Feature Scaling in Machine Learning

Learn to preprocess data by encoding and scaling features for better machine learning model performance

Medium · Machine Learning

Data Preprocessing: Encoding and Feature Scaling in Machine Learning

Learn to preprocess data for machine learning by encoding and scaling features, a crucial step for model training

Medium · Data Science

Data Preprocessing: Encoding and Feature Scaling in Machine Learning

Learn to preprocess data for machine learning by encoding and scaling features, a crucial step for model training

Medium · Python

The Python Dictionary Trick That Makes Interviewers Smile

Learn the Python dictionary trick that impresses interviewers and improves your coding skills

Dev.to · Ameer Abdullah

Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub

FAME WORLD EDUCATIONAL HUB