PyTorch Tutorial 04 - Backpropagation - Theory With Example
Key Takeaways
This video tutorial series covers the fundamentals of deep learning with PyTorch, focusing on the backpropagation algorithm, including its theory and a practical example using PyTorch's autograd system. The tutorial demonstrates how to compute local gradients, apply the chain rule, and update weights using PyTorch.
Full Transcript
hi everybody welcome to a new PI torch tutorial in this video I'm going to explain the famous back propagation algorithm and how we can calculate gradients with it I explained the necessary concepts of this technique and then I will walk you through a concrete example with some numbers and at the end we will then see how easy it is to apply back propagation in pi torch so let's start and the first concept we must know is the chain rule so let's say we have two operations or two functions so first we have to input X and then we apply a function a and get an output Y and then we use this output as the input for our second function so the second function B and then we get the final output C and now we want to minimize our C so we want to know the derivative of C with respect to our X and here in the beginning and we can do this using the so-called chain rule so for this we first compute the derivative of C with respect to Y and multiply this with the derivative of Y with respect to X and then we get the final derivative we want so first here we compute the derivative at this position so the derivative of this output with respect to this input and then here the derivative of this output with respect to this input and then we multiply them together and get the final gradient we are interested in so that's the chain rule and now the next concept is the so called computational graph so for every operation we do with our tenth source high-touch will create a graph for us silver at each node we apply one operation or one function with some inputs and then get an output so here at this case in this example we use a multiplication operations so we multiply x and y and then getsy and now at these notes we can calculate so-called local gradients and we can use them later in the chain rule to get the final gradient so here the local gradients we can compute two gradients the gradient of C with respect to X and this is simple since we know this function here so this is the gradient gradient of x times y with respect to X which is y and here in the bottom we compute the derivative of x times y would respect to Y which is X so local gradients are easy because we know this function and why do we want them because typically our graph has more operations and at the very end we calculate a loss function that we want to minimize so we have to calculate the gradient of this loss with respect to our parameter X in the beginning and now let's suppose at this position we already know the derivative of the loss with respect to our C and then we can get the final gradient we want so that with the chain rule so the gradient of the loss with respect to X is then the gradient of loss with respect to C times our local gradient so the derivative of C with respect to X and yeah this is how we get the final gradient then and now the whole concept consists of three steps so first we do a forward pass where we apply all the functions and compute the loss then at each node we calculate the local gradients and then we do a so-called backward pass where we compute the gradient of the loss with respect to our weights or parameters using the chain rule so these are the three steps we're gonna do and now we look at a concrete example so here we want to use linear regression and if you don't know how this works then I highly recommend my machine learning from scratch tutorial about linear regression I will put the link in the description so basically we model our output with a linear combination of some weights and an input so our Y hat or Y predicted is W times X and then we formulate some loss function so in this case this is the squared error actually it should be the mean squared error but for simplicity we just use the squared error otherwise you would have another operation to get the mean so the loss is the difference of the predicted Y minus the actual Y and then we square it and now we want to minimize our loss so we want to know the derivative of the loss with respect to our weights and how do we get that so we apply our three steps first we do a forward pass and put in the X and the W and then here we put in the Y and apply our functions here and then we get the loss then we calculate the group the local gradients at each node so here the gradient of the loss with respect to our s then here at the gradient of the s with respect to our Y hat and here at this node the gradient of Y hat with respect to our W and then we do a backward pass so we start at the end and here we have the first we have the derivative of the loss with respect to our s and then we use them and we also use the chain rule to get the derivative of the loss with respect to of the Y hat and then again we use this and the chain rule to get the final grade of the loss with respect to our W so let's do this with some concrete numbers so let's say we have x and y is given so X is 1 and Y is 2 in the beginning and so these are our training samples and we initialize our weight so let's say for example we say our W is 1 in the beginning and then we do the forward pass so here at the first node we multiply X and W so we get Y hat equals 1 then at the next node we do a subtraction so Y hat minus y this one minus 2 equals minus 1 and at the very end so we square our s so we have 1/2 s squared so our loss then is 1 and now we calculate the local gradient so at the last node we have the gradient of the loss with respect to s and this is simple because we know the function so this is the gradient of s squared so this is just 2 s and then at the next node we have the gradient of s with respect to Y hat which is the gradient of the function y hat minus y with respect to Y hat which is just 1 and then here at the last node we have the derivative of Y hat with respect to W so this is the derivative of W times X with respect to W which is X and also notice that we don't need to go don't need to know the derivatives in this graph lines so we don't need to know what is the derivative of s with respect to Y and also here we don't need this because our X and our Y are fixed so we are only interested in our parameters that we want to update here and yeah and then we do the backward pass so first now we use our local gradients so we want to compute the derivative of the loss with respect to y hat and here we use the chain rule with our to local gradients that we just computed which is 2 s times 1 and s is minus 1 which we calculated up here and then so this is minus 2 and now we use this derivative and also this loka gradient to then get the final gradient the gradient of the loss with respect to our W which is the gradient of the loss with respect to y hat times the gradient of Y hat with respect to W which is minus 2 times X and X is 1 so the final gradient is minus 2 so this is the final gradient then that we know want to know and yeah that's all how back propagation works and let's jump over to our code and verify that pi touch get these exact numbers so let's remember X is 1 Y is 2 and W is 1 and then our first gradient should be minus 2 so let's see how we can use this in pi torch and first of all we import torch of course then we create our vector art ends us so we say x equals torch dot tens or and this is 1 and then our y equals torch dot tens or with 2 and then our initial weight is a tensor also with 1 so one point zero to make it a float and here in with our weights we are interested in the gradient so we need to specify require squat equals true and then we do the forward pass and gets and compute the loss so we simply say y hat equals W times X which is our function and then we say loss equals y hat minus the actual Y and then we square this so we say this to the power of two and now let's print our loss and see this is one in the beginning and now we want to do the backward pass so let's do the backward pass and pi touch and we'll compute the local gradients automatically for us and also computes the backward pass automatically for us so the only thing that we have to call is say loss backward so this is the whole gradient computation and now our W has this dot gret attribute and we can print this and now this is the first gradient in the after the first forward and backward pass and remember this should be minus two in the beginning and here we see we have eight enso with minus two so this is working and the next steps would be for example now we update our weights and then we do the next forward and backward pass and do this for a couple of iterations and yeah that's how back propagation works and how and also how easy it is to use it in pi torch and I hope you enjoyed this tutorial please subscribe to the channel and see you next time bye
Original Description
New Tutorial series about Deep Learning with PyTorch!
⭐ Check out Tabnine, the FREE AI-powered code completion tool I use to help me code faster: https://www.tabnine.com/?utm_source=youtube.com&utm_campaign=PythonEngineer *
In this part I will explain the famous backpropagation algorithm. I will explain all the necessary concepts and walk you through a concrete example. At the end we will see how easy it is to
use backpropagation in PyTorch.
- Chain Rule
- Computational Graph and local gradients
- Forward and backward pass
- Concrete example with numbers (Linear Regression)
- How to use backpropagation in PyTorch
📚 Get my FREE NumPy Handbook:
https://www.python-engineer.com/numpybook
📓 Notebooks available on Patreon:
https://www.patreon.com/patrickloeber
⭐ Join Our Discord : https://discord.gg/FHMg9tKFSN
Part 04: Backpropagation - Theory With Example
If you enjoyed this video, please subscribe to the channel!
Official website:
https://pytorch.org/
Part 01:
https://youtu.be/EMXfZB8FVUA
Linear Regression from scratch:
https://youtu.be/4swNt7PiamQ
Code for this tutorial series:
https://github.com/patrickloeber/pytorchTutorial
You can find me here:
Website: https://www.python-engineer.com
Twitter: https://twitter.com/patloeber
GitHub: https://github.com/patrickloeber
#Python #DeepLearning #Pytorch
----------------------------------------------------------------------------------------------------------
* This is a sponsored link. By clicking on it you will not have any additional costs, instead you will support me and my project. Thank you so much for the support! 🙏
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Patrick Loeber · Patrick Loeber · 38 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
▶
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Lists in Python - Advanced Python 01 - Programming Tutorial
Patrick Loeber
Tuples in Python - Advanced Python 02 - Programming Tutorial
Patrick Loeber
Dictionaries in Python - Advanced Python 03 - Programming Tutorial
Patrick Loeber
Sets in Python - Advanced Python 04 - Programming Tutorial
Patrick Loeber
Strings in Python - Advanced Python 05 - Programming Tutorial
Patrick Loeber
Collections in Python - Advanced Python 06 - Programming Tutorial
Patrick Loeber
Itertools in Python - Advanced Python 07 - Programming Tutorial
Patrick Loeber
Lambda in Python - Advanced Python 08 - Programming Tutorial - Map Filter Reduce
Patrick Loeber
Exceptions in Python - Advanced Python 09 - Programming Tutorial
Patrick Loeber
Logging in Python - Advanced Python 10 - Programming Tutorial
Patrick Loeber
JSON in Python - Advanced Python 11 - Programming Tutorial
Patrick Loeber
Random Numbers in Python - Advanced Python 12 - Programming Tutorial
Patrick Loeber
Decorators in Python - Advanced Python 13 - Programming Tutorial
Patrick Loeber
Generators in Python - Advanced Python 14 - Programming Tutorial
Patrick Loeber
Threading vs Multiprocessing in Python - Advanced Python 15 - Programming Tutorial
Patrick Loeber
Threading in Python - Advanced Python 16 - Programming Tutorial
Patrick Loeber
Multiprocessing in Python - Advanced Python 17 - Programming Tutorial
Patrick Loeber
Function arguments in detail - Advanced Python 18 - Programming Tutorial
Patrick Loeber
The asterisk (*) operator in Python - Advanced Python 19 - Programming Tutorial
Patrick Loeber
Shallow vs Deep Copying in Python - Advanced Python 20 - Programming Tutorial
Patrick Loeber
Context Managers in Python - Advanced Python 21 - Programming Tutorial
Patrick Loeber
KNN (K Nearest Neighbors) in Python - Machine Learning From Scratch 01 - Python Tutorial
Patrick Loeber
Linear Regression in Python - Machine Learning From Scratch 02 - Python Tutorial
Patrick Loeber
Logistic Regression in Python - Machine Learning From Scratch 03 - Python Tutorial
Patrick Loeber
Linear and Logistic Regression in 60 lines of Python - Machine Learning From Scratch 04
Patrick Loeber
Naive Bayes in Python - Machine Learning From Scratch 05 - Python Tutorial
Patrick Loeber
Perceptron in Python - Machine Learning From Scratch 06 - Python Tutorial
Patrick Loeber
SVM (Support Vector Machine) in Python - Machine Learning From Scratch 07 - Python Tutorial
Patrick Loeber
Decision Tree in Python Part 1/2 - Machine Learning From Scratch 08 - Python Tutorial
Patrick Loeber
Decision Tree in Python Part 2/2 - Machine Learning From Scratch 09 - Python Tutorial
Patrick Loeber
Random Forest in Python - Machine Learning From Scratch 10 - Python Tutorial
Patrick Loeber
PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial
Patrick Loeber
K-Means Clustering in Python - Machine Learning From Scratch 12 - Python Tutorial
Patrick Loeber
Anaconda Tutorial - Installation and Basic Commands
Patrick Loeber
PyTorch Tutorial 01 - Installation
Patrick Loeber
PyTorch Tutorial 02 - Tensor Basics
Patrick Loeber
PyTorch Tutorial 03 - Gradient Calculation With Autograd
Patrick Loeber
PyTorch Tutorial 04 - Backpropagation - Theory With Example
Patrick Loeber
PyTorch Tutorial 05 - Gradient Descent with Autograd and Backpropagation
Patrick Loeber
PyTorch Tutorial 06 - Training Pipeline: Model, Loss, and Optimizer
Patrick Loeber
PyTorch Tutorial 07 - Linear Regression
Patrick Loeber
PyTorch Tutorial 08 - Logistic Regression
Patrick Loeber
PyTorch Tutorial 09 - Dataset and DataLoader - Batch Training
Patrick Loeber
PyTorch Tutorial 10 - Dataset Transforms
Patrick Loeber
Download Images With Python Automatically - Python Web Scraping Tutorial
Patrick Loeber
PyTorch Tutorial 11 - Softmax and Cross Entropy
Patrick Loeber
Select Movies with Python - Web Scraping Tutorial
Patrick Loeber
PyTorch Tutorial 12 - Activation Functions
Patrick Loeber
List Comprehension in Python - A Python Feature You MUST KNOW - Python Tutorial
Patrick Loeber
PyTorch Tutorial 13 - Feed-Forward Neural Network
Patrick Loeber
How To Add A Progress Bar In Python With Just One Line - Python Tutorial
Patrick Loeber
PyTorch Tutorial 14 - Convolutional Neural Network (CNN)
Patrick Loeber
The Walrus Operator - New in Python 3.8 - Python Tutorial
Patrick Loeber
PyTorch Tutorial 15 - Transfer Learning
Patrick Loeber
YouTube Data API Tutorial with Python - Analyze Channel Statistics - Part 1
Patrick Loeber
YouTube Data API Tutorial with Python - Find Channel Videos - Part 2
Patrick Loeber
YouTube Data API Tutorial with Python - Get Video Statistics - Part 3
Patrick Loeber
YouTube Data API Tutorial with Python - Analyze the Data - Part 4
Patrick Loeber
AdaBoost in Python - Machine Learning From Scratch 13 - Python Tutorial
Patrick Loeber
Ultimate FREE Study Guide for Machine Learning and Deep Learning
Patrick Loeber
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Want to get started with deep learning
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Medium · Deep Learning
🎓
Tutor Explanation
DeepCamp AI