Pytorch VGG implementation from scratch
Key Takeaways
The video implements the VGG16, VGG13, and VGG19 architectures from scratch in PyTorch, covering the network's architecture, convolutional and fully connected layers, and model implementation.
Full Transcript
[Music] in this video we're going to walk through how to code the VG architecture in pi torch but let us start with understanding how it works first so here's the VG paper we're not gonna read entire paper we're just gonna look at the the part where they mention the implementation details so specifically the sentence the convolutions ride is fixed to one pixel the spatial padding of calm player input is such that the spatial resolution is preserved after convolution ie the padding is one pixel for 3x3 comm layers so if the kernel is three by three the padding is one and destryed is one and also max pooling is performed over two by two kernel with a stride of two and then they have some different vdg architectures so really there are several different VG architectures and the one that we're gonna focus most on is VG 16 because that's the one that's most popular I'm actually gonna show you at the end of the video of how to implement each of them so you can choose but the one we're gonna focus on is this one and why it's called VG 16 is because it has 16 weight layers so if we go through just briefly they say comm 3 the 3 here means that it's a 3 by 3 kernel right and as we saw above in the text 3 by 3 kernel is associated with a padding of 1 and a stride of 1 and that's the nice thing about the VG architecture is that using those parameters the the image resolution or the size of the the number of features always stay the same so the it's the same convolution and the last one here is the number of channels or filters that's the output channels so we input our GBS we have three input channels after the first layer it's 64 output channels 64 and then 128 256 so we see that the number of channels is just doubled after in this case to comm layers to come layers and then doubled after three count layers and yeah so in between those blocks I guess you could call them we have a max pooling and the max pooling is with a kernel of 2 by 2 and stride of 2 which means that the the the size is halved so let's say we have to 24 by 2 24 in like image that we have as input then after each of these complex it's gonna stay exactly the same so it's gonna be - 24 - 24 after this but after entering after leaving the max pool it's going to be hundred and twelve times 112 and similarly here it's gonna be under in 12 divided by 2 etc so after doing all of those comm layers we go to another max pool and then we have three fully connected layers yeah so that's the basic of the architecture let's go back to the code and try to code this from scratch so I've summarized the vgg architecture like this where the integer values represent the output channels after performing that come layer or if we write M that means that it's a max pool so after doing all of those this is the comp layers part of the network then we do a flatten and we use three linear layers so what we want to do now is we want to essentially create a class which we'll call vdg net will inherit from the NN module we'll define an init let's say we also input the number of in channels and yeah the number of classes I guess the number of classes that we're going to use and the first thing that we're gonna do is call the super vgg net in it so in hair we run the innit of the parent method and what we want to do now is essentially we can't create we can do like we've done in the previous videos like self calm one is Anand calm cetera and specify all of these what we're gonna do is actually something more clever I think and it's going to generalize better and we're going to be able to implement all of the VG architectures and the code is going to be cleaner as well so we're gonna have a forward as we always have and then we're also gonna create another function that's we're gonna call create complex and we're going to send in the architecture so the first thing that we want to do here is is defined so we call self that in channels is the number of in channels and we can set them to three and a thousand by default yeah so then what we want to do is that we want to call Melissa stuff like um layers will create all the comp letters from this function so we're gonna call this with our VG 16 the list of how to construct the comp layers since really all the information is stored in this this array here right since we know that it's always going to be a three by three kernel with a padding of one and a schreiter one it's really that information is always the same no matter what the output channels are so we want to do here is that we're gonna we're gonna call layers to be an empty list we'll set the in channels to be self dot in channels and then we're going to go through each in that architecture so we're going to go through each in this list so let's call for 4x in in architecture and we're going to check if the type is is integer then we know that it's going to be a comp layer right so we're gonna do first is that we're gonna call the out channels to be X right we know that it's gonna be first in channels is three and then it's gonna be 60 for our channels and then what we're gonna do is and we're gonna add layers plus equals and we're gonna set all of those comm just add them to layers with this for loop so this is gonna make the code a lot cleaner and then we will set in channels to be in channels then we're gonna set out channels to be out channels then we're gonna set kernel size to be three by three or we don't have to write three by three but yeah we can do that to me be clear and yeah and then I stride of one and a padding of one then you'll see what we're gonna do is actually something that's a little bit different from the there you you don't have to include us but there's really no reason why not to do it since like the only reason why it's not there in the VGA architecture is because it wasn't invented at that time so we're gonna include a bachelor layer and then we're gonna do a relly right it's a pretty standard convolution back from relu you can remove this one this is not including the original vgg paper we just included here because it's going to improve performance yeah yeah let's see is there something missing another princess here and then what we're gonna do is we're gonna say if this is the outer channels currently right X is the outer channels then we need to input the in channels for the next color that we're gonna create so the in channels now needs to be equal to X to update in channels for the next layer the in channels are going to be 64 if we're considering the first element in the list but if it's not a integer value we know that it's a string so I guess we could just do else if x equals M then all we're gonna do is add a a max pool and we know that the kernel size is 2 by 2 and destryed is 2 by 2 and yeah the only thing we need to do in the end when we return it we're gonna call n n dot sequential and we're gonna do star layers essentially unpacking all that we stored in the in the empty list and the package is gonna create a entire block of all of those come to the bottom rail ooh that we've created and all we want to do here then is yeah so we've already called create complex and guess itself calm create calm layers and the only thing we need to do more is we need to create the the fully connected layers right that that was the calm part and we have the flatten and the linear layers left so we're gonna do self dot fully connected let's go FF CS and we're gonna again use an end of sequential so NN sequential is like when we're having a lot of when we're using a lot of like Anand linear and then calm it's uh it can like make it more compact if we just include them in a and n dot sequential case we can use a nonlinear and all we want to do now is just create the linear part so we have N and not linear and the number of channels that we have is gonna be 512 and what we're gonna have left of the the image is a seven by seven yeah I guess you could I guess we could calculate it quickly so we have we have 224 and then we have one max pool - max pool three max pool for max pool five max pool so we have to raise to 5 which is 7 and the the next is going to be 4096 this is just what they chose then we're gonna do an under tray loop and then dot drop out I don't believe I mentioned that they used drop out in the when we went through the implementation of in the paper but they usually drop out as well in the linear layers and then they have another linear 4096 4096 and another relu and another drop out and another linear and yeah we're going to call the last one to the number of classes that's that's it we create the calm layers and then we knew the fully connected part and we want to call them so we want to do X equals self calm layers of X and then we want to reshape it because now we want to flatten it to the to the linear part so what we're gonna do is just x equals x add reshape X shape of zero comma minus 1 and then again call on that flattened part we're gonna send it to the fully connected we just want to return X okay so now what we want to do is that we want to check that it actually works so let's hope we call it model to be VG net of yeah we can set in channels to three num classes to a thousand and we can do some torch that random and let's say that we generate a single image of in channel three - 24 - 24 and then we do print model X dot shape and so remember well what we do now is that we just ran generate some random data that's gonna have the form like having an image and in this case we send in a single image and then we just want to print the shape and we want it to be 1 by 1000 in this case so let's run this this actually might be slow and the VG is kind of okay pretty fast so 1 by 1000 and that's what we expected so this architecture is really really it's kind of large it's not large by today's standards I guess but still if you don't have a great CPU this might take a while what we can do is that we can set it to CUDA if CUDA is available else CPU and then we can just set it to the to 2 device and dot to device I think that should make it a little bit faster perhaps CUDA thought is available and you oh yeah so we can to torch dock CUDA is available and that should also work now it's run on the GPU let's see so one thing now is that we've only implemented it for vdg 16 right but the type of like how we implemented it is very general so all that we have to do and let me get that piece of the code and all we're gonna do is just change one thing and it's going to be a general implementation so I replace this part here so instead we have a dictionary which includes vgg 11 13 16 and 19 and then the flattened part is the same for all of those architectures so the only difference between VG 16 and 19 is that they have more of these calm layers and that's represented in the array so that we want to do now is just change this part we can just call VG types of VG 16 and this should work exactly the same yeah and we can also change this to VG 11 VG 19 digit 13 depending on the one that you want to use hopefully this was a clear if you have any questions then please leave them in the comment thank you so much for watching the video and I hope to see you in the next one
Original Description
In this video we go through the network and code the VGG16 and also VGG13, VGG13, VGG19 in Pytorch from scratch.
The VGG Paper:
https://arxiv.org/abs/1409.1556
❤️ Support the channel ❤️
https://www.youtube.com/channel/UCkzW5JSFwvKRjXABI-UTAkQ/join
Paid Courses I recommend for learning (affiliate links, no extra cost for you):
⭐ Machine Learning Specialization https://bit.ly/3hjTBBt
⭐ Deep Learning Specialization https://bit.ly/3YcUkoI
📘 MLOps Specialization http://bit.ly/3wibaWy
📘 GAN Specialization https://bit.ly/3FmnZDl
📘 NLP Specialization http://bit.ly/3GXoQuP
✨ Free Resources that are great:
NLP: https://web.stanford.edu/class/cs224n/
CV: http://cs231n.stanford.edu/
Deployment: https://fullstackdeeplearning.com/
FastAI: https://www.fast.ai/
💻 My Deep Learning Setup and Recording Setup:
https://www.amazon.com/shop/aladdinpersson
GitHub Repository:
https://github.com/aladdinpersson/Machine-Learning-Collection
✅ One-Time Donations:
Paypal: https://bit.ly/3buoRYH
▶️ You Can Connect with me on:
Twitter - https://twitter.com/aladdinpersson
LinkedIn - https://www.linkedin.com/in/aladdin-persson-a95384153/
Github - https://github.com/aladdinpersson
OUTLINE
0:00 - Introduction
0:19 - VGG Paper Review
3:38 - Coding the VGG
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Aladdin Persson · Aladdin Persson · 33 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
▶
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
computeCost.m Linear Regression Cost Function - Machine Learning
Aladdin Persson
gradientDescent.m Gradient Descent Implementation - Machine Learning
Aladdin Persson
Neural Network from scratch - Part 1 (Standard Notation)
Aladdin Persson
Neural Network from scratch - Part 2 (Forward Propagation)
Aladdin Persson
Neural Network from scratch - Part 3 (Backward Propagation)
Aladdin Persson
Neural Network from scratch - Part 4 (With Python)
Aladdin Persson
sigmoid.m - Programming Assignment 2 Machine Learning
Aladdin Persson
costFunction.m - Programming Assignment 2 Machine Learning
Aladdin Persson
predict.m - Programming Assignment 2 Machine Learning
Aladdin Persson
costFunctionReg.m - Programming Assignment 2 Machine Learning
Aladdin Persson
lrCostFunction.m - Programming Assignment 3 Machine Learning
Aladdin Persson
oneVsAll.m - Programming Assignment 3 Machine Learning
Aladdin Persson
predictOneVsAll.m - Programming Assignment 3 Machine Learning
Aladdin Persson
predict.m - Programming Assignment 3 Machine Learning
Aladdin Persson
Caesar Cipher Encryption and Decryption with example
Aladdin Persson
Cryptography: Caesar Cipher Python
Aladdin Persson
Vigenere Cipher Explained (with Example)
Aladdin Persson
Cryptography: Vigenere Cipher Python
Aladdin Persson
Hill Cipher Explained (with Example)
Aladdin Persson
Cryptography: Hill Cipher Python
Aladdin Persson
Interval Scheduling Greedy Algorithm: Python
Aladdin Persson
Weighted Interval Scheduling Algorithm Explained
Aladdin Persson
Weighted Interval Scheduling Python Code
Aladdin Persson
Sequence Alignment | Needleman Wunsch Algorithm
Aladdin Persson
Sequence Alignment | Needleman Wunsch in Python
Aladdin Persson
Codility BinaryGap Python
Aladdin Persson
Codility CyclicRotation Python
Aladdin Persson
Derivation Linear Regression with Gradient Descent
Aladdin Persson
Linear Regression Gradient Descent From Scratch in Python
Aladdin Persson
Pytorch Neural Network example
Aladdin Persson
Pytorch CNN example (Convolutional Neural Network)
Aladdin Persson
Pytorch LeNet implementation from scratch
Aladdin Persson
Pytorch VGG implementation from scratch
Aladdin Persson
Pytorch GoogLeNet / InceptionNet implementation from scratch
Aladdin Persson
How to save and load models in Pytorch
Aladdin Persson
How to build custom Datasets for Images in Pytorch
Aladdin Persson
Pytorch Transfer Learning and Fine Tuning Tutorial
Aladdin Persson
Pytorch Data Augmentation using Torchvision
Aladdin Persson
Pytorch Quick Tip: Weight Initialization
Aladdin Persson
Pytorch Quick Tip: Using a Learning Rate Scheduler
Aladdin Persson
Pytorch ResNet implementation from Scratch
Aladdin Persson
Pytorch TensorBoard Tutorial
Aladdin Persson
Pytorch DCGAN Tutorial (See description for updated video)
Aladdin Persson
Naive Bayes from Scratch - Machine Learning Python
Aladdin Persson
Spam Classifier using Naive Bayes in Python
Aladdin Persson
K-Nearest Neighbor from scratch - Machine Learning Python
Aladdin Persson
Linear Regression Normal Equation Python
Aladdin Persson
SVM from Scratch - Machine Learning Python (Support Vector Machine)
Aladdin Persson
Neural Network from Scratch - Machine Learning Python
Aladdin Persson
Pytorch RNN example (Recurrent Neural Network)
Aladdin Persson
Pytorch Bidirectional LSTM example
Aladdin Persson
Pytorch Text Generator with character level LSTM
Aladdin Persson
Logistic Regression from Scratch - Machine Learning Python
Aladdin Persson
K-Means Clustering from Scratch - Machine Learning Python
Aladdin Persson
Pytorch Torchtext Tutorial 1: Custom Datasets and loading JSON/CSV/TSV files
Aladdin Persson
Pytorch Torchtext Tutorial 2: Built in Datasets with Example
Aladdin Persson
Pytorch Torchtext Tutorial 3: From Textfiles to Dataset
Aladdin Persson
Paper Review: Sequence to Sequence Learning with Neural Networks
Aladdin Persson
Pytorch Seq2Seq Tutorial for Machine Translation
Aladdin Persson
Pytorch Seq2Seq with Attention for Machine Translation
Aladdin Persson
More on: Reading ML Papers
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
I Spent Weeks Looking for a Research Gap Before I Realized I Was Searching the Wrong Way
Medium · AI
ICMI 2026 Reviews [D]
Reddit r/MachineLearning
Workshop submission for main conference paper under review [D]
Reddit r/MachineLearning
Kept context-switching between arxiv, OpenReview, GitHub, and HuggingFace for every paper, so I built this. Chrome extension + website with everything inline, plus citation graph + SPECTER2 neighbors. 3M papers, free, feedback welcome [P]
Reddit r/MachineLearning
Chapters (3)
Introduction
0:19
VGG Paper Review
3:38
Coding the VGG
🎓
Tutor Explanation
DeepCamp AI