Pytorch VGG implementation from scratch

Aladdin Persson · Beginner ·📄 Research Papers Explained ·6y ago

Skills: Reading ML Papers90%Paper Reproduction80%Research Methods70%

Key Takeaways

The video implements the VGG16, VGG13, and VGG19 architectures from scratch in PyTorch, covering the network's architecture, convolutional and fully connected layers, and model implementation.

Full Transcript

[Music] in this video we're going to walk through how to code the VG architecture in pi torch but let us start with understanding how it works first so here's the VG paper we're not gonna read entire paper we're just gonna look at the the part where they mention the implementation details so specifically the sentence the convolutions ride is fixed to one pixel the spatial padding of calm player input is such that the spatial resolution is preserved after convolution ie the padding is one pixel for 3x3 comm layers so if the kernel is three by three the padding is one and destryed is one and also max pooling is performed over two by two kernel with a stride of two and then they have some different vdg architectures so really there are several different VG architectures and the one that we're gonna focus most on is VG 16 because that's the one that's most popular I'm actually gonna show you at the end of the video of how to implement each of them so you can choose but the one we're gonna focus on is this one and why it's called VG 16 is because it has 16 weight layers so if we go through just briefly they say comm 3 the 3 here means that it's a 3 by 3 kernel right and as we saw above in the text 3 by 3 kernel is associated with a padding of 1 and a stride of 1 and that's the nice thing about the VG architecture is that using those parameters the the image resolution or the size of the the number of features always stay the same so the it's the same convolution and the last one here is the number of channels or filters that's the output channels so we input our GBS we have three input channels after the first layer it's 64 output channels 64 and then 128 256 so we see that the number of channels is just doubled after in this case to comm layers to come layers and then doubled after three count layers and yeah so in between those blocks I guess you could call them we have a max pooling and the max pooling is with a kernel of 2 by 2 and stride of 2 which means that the the the size is halved so let's say we have to 24 by 2 24 in like image that we have as input then after each of these complex it's gonna stay exactly the same so it's gonna be - 24 - 24 after this but after entering after leaving the max pool it's going to be hundred and twelve times 112 and similarly here it's gonna be under in 12 divided by 2 etc so after doing all of those comm layers we go to another max pool and then we have three fully connected layers yeah so that's the basic of the architecture let's go back to the code and try to code this from scratch so I've summarized the vgg architecture like this where the integer values represent the output channels after performing that come layer or if we write M that means that it's a max pool so after doing all of those this is the comp layers part of the network then we do a flatten and we use three linear layers so what we want to do now is we want to essentially create a class which we'll call vdg net will inherit from the NN module we'll define an init let's say we also input the number of in channels and yeah the number of classes I guess the number of classes that we're going to use and the first thing that we're gonna do is call the super vgg net in it so in hair we run the innit of the parent method and what we want to do now is essentially we can't create we can do like we've done in the previous videos like self calm one is Anand calm cetera and specify all of these what we're gonna do is actually something more clever I think and it's going to generalize better and we're going to be able to implement all of the VG architectures and the code is going to be cleaner as well so we're gonna have a forward as we always have and then we're also gonna create another function that's we're gonna call create complex and we're going to send in the architecture so the first thing that we want to do here is is defined so we call self that in channels is the number of in channels and we can set them to three and a thousand by default yeah so then what we want to do is that we want to call Melissa stuff like um layers will create all the comp letters from this function so we're gonna call this with our VG 16 the list of how to construct the comp layers since really all the information is stored in this this array here right since we know that it's always going to be a three by three kernel with a padding of one and a schreiter one it's really that information is always the same no matter what the output channels are so we want to do here is that we're gonna we're gonna call layers to be an empty list we'll set the in channels to be self dot in channels and then we're going to go through each in that architecture so we're going to go through each in this list so let's call for 4x in in architecture and we're going to check if the type is is integer then we know that it's going to be a comp layer right so we're gonna do first is that we're gonna call the out channels to be X right we know that it's gonna be first in channels is three and then it's gonna be 60 for our channels and then what we're gonna do is and we're gonna add layers plus equals and we're gonna set all of those comm just add them to layers with this for loop so this is gonna make the code a lot cleaner and then we will set in channels to be in channels then we're gonna set out channels to be out channels then we're gonna set kernel size to be three by three or we don't have to write three by three but yeah we can do that to me be clear and yeah and then I stride of one and a padding of one then you'll see what we're gonna do is actually something that's a little bit different from the there you you don't have to include us but there's really no reason why not to do it since like the only reason why it's not there in the VGA architecture is because it wasn't invented at that time so we're gonna include a bachelor layer and then we're gonna do a relly right it's a pretty standard convolution back from relu you can remove this one this is not including the original vgg paper we just included here because it's going to improve performance yeah yeah let's see is there something missing another princess here and then what we're gonna do is we're gonna say if this is the outer channels currently right X is the outer channels then we need to input the in channels for the next color that we're gonna create so the in channels now needs to be equal to X to update in channels for the next layer the in channels are going to be 64 if we're considering the first element in the list but if it's not a integer value we know that it's a string so I guess we could just do else if x equals M then all we're gonna do is add a a max pool and we know that the kernel size is 2 by 2 and destryed is 2 by 2 and yeah the only thing we need to do in the end when we return it we're gonna call n n dot sequential and we're gonna do star layers essentially unpacking all that we stored in the in the empty list and the package is gonna create a entire block of all of those come to the bottom rail ooh that we've created and all we want to do here then is yeah so we've already called create complex and guess itself calm create calm layers and the only thing we need to do more is we need to create the the fully connected layers right that that was the calm part and we have the flatten and the linear layers left so we're gonna do self dot fully connected let's go FF CS and we're gonna again use an end of sequential so NN sequential is like when we're having a lot of when we're using a lot of like Anand linear and then calm it's uh it can like make it more compact if we just include them in a and n dot sequential case we can use a nonlinear and all we want to do now is just create the linear part so we have N and not linear and the number of channels that we have is gonna be 512 and what we're gonna have left of the the image is a seven by seven yeah I guess you could I guess we could calculate it quickly so we have we have 224 and then we have one max pool - max pool three max pool for max pool five max pool so we have to raise to 5 which is 7 and the the next is going to be 4096 this is just what they chose then we're gonna do an under tray loop and then dot drop out I don't believe I mentioned that they used drop out in the when we went through the implementation of in the paper but they usually drop out as well in the linear layers and then they have another linear 4096 4096 and another relu and another drop out and another linear and yeah we're going to call the last one to the number of classes that's that's it we create the calm layers and then we knew the fully connected part and we want to call them so we want to do X equals self calm layers of X and then we want to reshape it because now we want to flatten it to the to the linear part so what we're gonna do is just x equals x add reshape X shape of zero comma minus 1 and then again call on that flattened part we're gonna send it to the fully connected we just want to return X okay so now what we want to do is that we want to check that it actually works so let's hope we call it model to be VG net of yeah we can set in channels to three num classes to a thousand and we can do some torch that random and let's say that we generate a single image of in channel three - 24 - 24 and then we do print model X dot shape and so remember well what we do now is that we just ran generate some random data that's gonna have the form like having an image and in this case we send in a single image and then we just want to print the shape and we want it to be 1 by 1000 in this case so let's run this this actually might be slow and the VG is kind of okay pretty fast so 1 by 1000 and that's what we expected so this architecture is really really it's kind of large it's not large by today's standards I guess but still if you don't have a great CPU this might take a while what we can do is that we can set it to CUDA if CUDA is available else CPU and then we can just set it to the to 2 device and dot to device I think that should make it a little bit faster perhaps CUDA thought is available and you oh yeah so we can to torch dock CUDA is available and that should also work now it's run on the GPU let's see so one thing now is that we've only implemented it for vdg 16 right but the type of like how we implemented it is very general so all that we have to do and let me get that piece of the code and all we're gonna do is just change one thing and it's going to be a general implementation so I replace this part here so instead we have a dictionary which includes vgg 11 13 16 and 19 and then the flattened part is the same for all of those architectures so the only difference between VG 16 and 19 is that they have more of these calm layers and that's represented in the array so that we want to do now is just change this part we can just call VG types of VG 16 and this should work exactly the same yeah and we can also change this to VG 11 VG 19 digit 13 depending on the one that you want to use hopefully this was a clear if you have any questions then please leave them in the comment thank you so much for watching the video and I hope to see you in the next one

Original Description

In this video we go through the network and code the VGG16 and also VGG13, VGG13, VGG19 in Pytorch from scratch. The VGG Paper: https://arxiv.org/abs/1409.1556 ❤️ Support the channel ❤️ https://www.youtube.com/channel/UCkzW5JSFwvKRjXABI-UTAkQ/join Paid Courses I recommend for learning (affiliate links, no extra cost for you): ⭐ Machine Learning Specialization https://bit.ly/3hjTBBt ⭐ Deep Learning Specialization https://bit.ly/3YcUkoI 📘 MLOps Specialization http://bit.ly/3wibaWy 📘 GAN Specialization https://bit.ly/3FmnZDl 📘 NLP Specialization http://bit.ly/3GXoQuP ✨ Free Resources that are great: NLP: https://web.stanford.edu/class/cs224n/ CV: http://cs231n.stanford.edu/ Deployment: https://fullstackdeeplearning.com/ FastAI: https://www.fast.ai/ 💻 My Deep Learning Setup and Recording Setup: https://www.amazon.com/shop/aladdinpersson GitHub Repository: https://github.com/aladdinpersson/Machine-Learning-Collection ✅ One-Time Donations: Paypal: https://bit.ly/3buoRYH ▶️ You Can Connect with me on: Twitter - https://twitter.com/aladdinpersson LinkedIn - https://www.linkedin.com/in/aladdin-persson-a95384153/ Github - https://github.com/aladdinpersson OUTLINE 0:00 - Introduction 0:19 - VGG Paper Review 3:38 - Coding the VGG

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Aladdin Persson · Aladdin Persson · 33 of 60

← Previous Next →

computeCost.m Linear Regression Cost Function - Machine Learning

computeCost.m Linear Regression Cost Function - Machine Learning

Aladdin Persson

gradientDescent.m Gradient Descent Implementation - Machine Learning

gradientDescent.m Gradient Descent Implementation - Machine Learning

Aladdin Persson

Neural Network from scratch - Part 1 (Standard Notation)

Neural Network from scratch - Part 1 (Standard Notation)

Aladdin Persson

Neural Network from scratch - Part 2 (Forward Propagation)

Neural Network from scratch - Part 2 (Forward Propagation)

Aladdin Persson

Neural Network from scratch - Part 3 (Backward Propagation)

Neural Network from scratch - Part 3 (Backward Propagation)

Aladdin Persson

Neural Network from scratch - Part 4 (With Python)

Neural Network from scratch - Part 4 (With Python)

Aladdin Persson

sigmoid.m - Programming Assignment 2 Machine Learning

sigmoid.m - Programming Assignment 2 Machine Learning

Aladdin Persson

costFunction.m - Programming Assignment 2 Machine Learning

costFunction.m - Programming Assignment 2 Machine Learning

Aladdin Persson

predict.m - Programming Assignment 2 Machine Learning

predict.m - Programming Assignment 2 Machine Learning

Aladdin Persson

costFunctionReg.m - Programming Assignment 2 Machine Learning

costFunctionReg.m - Programming Assignment 2 Machine Learning

Aladdin Persson

lrCostFunction.m - Programming Assignment 3 Machine Learning

lrCostFunction.m - Programming Assignment 3 Machine Learning

Aladdin Persson

oneVsAll.m - Programming Assignment 3 Machine Learning

oneVsAll.m - Programming Assignment 3 Machine Learning

Aladdin Persson

predictOneVsAll.m - Programming Assignment 3 Machine Learning

predictOneVsAll.m - Programming Assignment 3 Machine Learning

Aladdin Persson

predict.m - Programming Assignment 3 Machine Learning

predict.m - Programming Assignment 3 Machine Learning

Aladdin Persson

Caesar Cipher Encryption and Decryption with example

Caesar Cipher Encryption and Decryption with example

Aladdin Persson

Cryptography: Caesar Cipher Python

Cryptography: Caesar Cipher Python

Aladdin Persson

Vigenere Cipher Explained (with Example)

Vigenere Cipher Explained (with Example)

Aladdin Persson

Cryptography: Vigenere Cipher Python

Cryptography: Vigenere Cipher Python

Aladdin Persson

Hill Cipher Explained (with Example)

Hill Cipher Explained (with Example)

Aladdin Persson

Cryptography: Hill Cipher Python

Cryptography: Hill Cipher Python

Aladdin Persson

Interval Scheduling Greedy Algorithm: Python

Interval Scheduling Greedy Algorithm: Python

Aladdin Persson

Weighted Interval Scheduling Algorithm Explained

Weighted Interval Scheduling Algorithm Explained

Aladdin Persson

Weighted Interval Scheduling Python Code

Weighted Interval Scheduling Python Code

Aladdin Persson

Sequence Alignment | Needleman Wunsch Algorithm

Sequence Alignment | Needleman Wunsch Algorithm

Aladdin Persson

Sequence Alignment | Needleman Wunsch in Python

Sequence Alignment | Needleman Wunsch in Python

Aladdin Persson

Codility BinaryGap Python

Codility BinaryGap Python

Aladdin Persson

Codility CyclicRotation Python

Codility CyclicRotation Python

Aladdin Persson

Derivation Linear Regression with Gradient Descent

Derivation Linear Regression with Gradient Descent

Aladdin Persson

Linear Regression Gradient Descent From Scratch in Python

Linear Regression Gradient Descent From Scratch in Python

Aladdin Persson

Pytorch Neural Network example

Pytorch Neural Network example

Aladdin Persson

Pytorch CNN example (Convolutional Neural Network)

Pytorch CNN example (Convolutional Neural Network)

Aladdin Persson

Pytorch LeNet implementation from scratch

Pytorch LeNet implementation from scratch

Aladdin Persson

Pytorch VGG implementation from scratch

Pytorch VGG implementation from scratch

Aladdin Persson

Pytorch GoogLeNet / InceptionNet implementation from scratch

Pytorch GoogLeNet / InceptionNet implementation from scratch

Aladdin Persson

How to save and load models in Pytorch

How to save and load models in Pytorch

Aladdin Persson

How to build custom Datasets for Images in Pytorch

How to build custom Datasets for Images in Pytorch

Aladdin Persson

Pytorch Transfer Learning and Fine Tuning Tutorial

Pytorch Transfer Learning and Fine Tuning Tutorial

Aladdin Persson

Pytorch Data Augmentation using Torchvision

Pytorch Data Augmentation using Torchvision

Aladdin Persson

Pytorch Quick Tip: Weight Initialization

Pytorch Quick Tip: Weight Initialization

Aladdin Persson

Pytorch Quick Tip: Using a Learning Rate Scheduler

Pytorch Quick Tip: Using a Learning Rate Scheduler

Aladdin Persson

Pytorch ResNet implementation from Scratch

Pytorch ResNet implementation from Scratch

Aladdin Persson

Pytorch TensorBoard Tutorial

Pytorch TensorBoard Tutorial

Aladdin Persson

Pytorch DCGAN Tutorial (See description for updated video)

Pytorch DCGAN Tutorial (See description for updated video)

Aladdin Persson

Naive Bayes from Scratch - Machine Learning Python

Naive Bayes from Scratch - Machine Learning Python

Aladdin Persson

Spam Classifier using Naive Bayes in Python

Spam Classifier using Naive Bayes in Python

Aladdin Persson

K-Nearest Neighbor from scratch - Machine Learning Python

K-Nearest Neighbor from scratch - Machine Learning Python

Aladdin Persson

Linear Regression Normal Equation Python

Linear Regression Normal Equation Python

Aladdin Persson

SVM from Scratch - Machine Learning Python (Support Vector Machine)

SVM from Scratch - Machine Learning Python (Support Vector Machine)

Aladdin Persson

Neural Network from Scratch - Machine Learning Python

Neural Network from Scratch - Machine Learning Python

Aladdin Persson

Pytorch RNN example (Recurrent Neural Network)

Pytorch RNN example (Recurrent Neural Network)

Aladdin Persson

Pytorch Bidirectional LSTM example

Pytorch Bidirectional LSTM example

Aladdin Persson

Pytorch Text Generator with character level LSTM

Pytorch Text Generator with character level LSTM

Aladdin Persson

Logistic Regression from Scratch - Machine Learning Python

Logistic Regression from Scratch - Machine Learning Python

Aladdin Persson

K-Means Clustering from Scratch - Machine Learning Python

K-Means Clustering from Scratch - Machine Learning Python

Aladdin Persson

Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files

Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files

Aladdin Persson

Pytorch Torchtext Tutorial 2: Built in Datasets with Example

Pytorch Torchtext Tutorial 2: Built in Datasets with Example

Aladdin Persson

Pytorch Torchtext Tutorial 3: From Textfiles to Dataset

Pytorch Torchtext Tutorial 3: From Textfiles to Dataset

Aladdin Persson

Paper Review: Sequence to Sequence Learning with Neural Networks

Paper Review: Sequence to Sequence Learning with Neural Networks

Aladdin Persson

Pytorch Seq2Seq Tutorial for Machine Translation

Pytorch Seq2Seq Tutorial for Machine Translation

Aladdin Persson

Pytorch Seq2Seq with Attention for Machine Translation

Pytorch Seq2Seq with Attention for Machine Translation

Aladdin Persson

This video teaches how to implement the VGG16, VGG13, and VGG19 architectures from scratch in PyTorch, covering the network's architecture, convolutional and fully connected layers, and model implementation. It provides a step-by-step guide on how to create a PyTorch model from a research paper.

Key Takeaways

Create a class called VGGNet that inherits from the NN module
Define an init method that calls the parent method's init
Create a forward method to define the network's architecture
Create a create_conv method to define the convolutional layers
Call the create_complex function to construct the convolutional layers
Iterate over the architecture and add convolutional layers to the list
Add a max pooling layer if the current layer is not an integer value
Return a sequential model with the constructed convolutional layers and fully connected layers

💡 Implementing a model from a research paper requires a thorough understanding of the paper's content and the ability to translate it into code.

🔒 Pro feature: Ask AI to explain this lesson →

More on: Reading ML Papers

View skill →

Automatic Literature Review with GPT-3 - I embedded and indexed all of arXiv into a search engine!

Automatic Literature Review with GPT-3 - I embedded and indexed all of arXiv into a search engine!

Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020

Marcos Lopez Caniego - ESASky's JupyterLab widget| JupyterCon 2020

Obsidian Zotero Integration Plugin | Streamline Your Research Paper Workflow 📝️

Obsidian Zotero Integration Plugin | Streamline Your Research Paper Workflow 📝️

This FULLY FREE Research Agent can BUILD Reports in Minutes!!!

This FULLY FREE Research Agent can BUILD Reports in Minutes!!!

Claude 3.7 Sonnet API | Build a Research Assistant

Claude 3.7 Sonnet API | Build a Research Assistant

I Built An Obsidian AI Research Assistant with Oz...

I Built An Obsidian AI Research Assistant with Oz...

Related AI Lessons

I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way

Learn how to effectively find research gaps by changing your approach, a crucial skill for AI researchers and academics

ICMI 2026 Reviews [D]

Learn how to interpret ICMI 2026 reviews and improve your paper's acceptance chances

Reddit r/MachineLearning

Workshop submission for main conference paper under review [D]

Learn how to navigate submitting a paper to a non-archival workshop before the final decision of a main conference like ECCV

Reddit r/MachineLearning

Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]

Streamline your research with a new Chrome extension and website that integrates 3M papers from arxiv, OpenReview, GitHub, and HuggingFace, including citation graphs and SPECTER2 neighbors, and provide feedback to improve it

Reddit r/MachineLearning

Chapters (3)

Introduction

0:19 VGG Paper Review

3:38 Coding the VGG

Beyond Big Vendors: ERP Systems Explained #shorts

Digital Transformation with Eric Kimberling