PyTorch Tutorial 04 - Backpropagation - Theory With Example

Patrick Loeber · Beginner ·🧬 Deep Learning ·6y ago

Key Takeaways

This video tutorial series covers the fundamentals of deep learning with PyTorch, focusing on the backpropagation algorithm, including its theory and a practical example using PyTorch's autograd system. The tutorial demonstrates how to compute local gradients, apply the chain rule, and update weights using PyTorch.

Full Transcript

hi everybody welcome to a new PI torch tutorial in this video I'm going to explain the famous back propagation algorithm and how we can calculate gradients with it I explained the necessary concepts of this technique and then I will walk you through a concrete example with some numbers and at the end we will then see how easy it is to apply back propagation in pi torch so let's start and the first concept we must know is the chain rule so let's say we have two operations or two functions so first we have to input X and then we apply a function a and get an output Y and then we use this output as the input for our second function so the second function B and then we get the final output C and now we want to minimize our C so we want to know the derivative of C with respect to our X and here in the beginning and we can do this using the so-called chain rule so for this we first compute the derivative of C with respect to Y and multiply this with the derivative of Y with respect to X and then we get the final derivative we want so first here we compute the derivative at this position so the derivative of this output with respect to this input and then here the derivative of this output with respect to this input and then we multiply them together and get the final gradient we are interested in so that's the chain rule and now the next concept is the so called computational graph so for every operation we do with our tenth source high-touch will create a graph for us silver at each node we apply one operation or one function with some inputs and then get an output so here at this case in this example we use a multiplication operations so we multiply x and y and then getsy and now at these notes we can calculate so-called local gradients and we can use them later in the chain rule to get the final gradient so here the local gradients we can compute two gradients the gradient of C with respect to X and this is simple since we know this function here so this is the gradient gradient of x times y with respect to X which is y and here in the bottom we compute the derivative of x times y would respect to Y which is X so local gradients are easy because we know this function and why do we want them because typically our graph has more operations and at the very end we calculate a loss function that we want to minimize so we have to calculate the gradient of this loss with respect to our parameter X in the beginning and now let's suppose at this position we already know the derivative of the loss with respect to our C and then we can get the final gradient we want so that with the chain rule so the gradient of the loss with respect to X is then the gradient of loss with respect to C times our local gradient so the derivative of C with respect to X and yeah this is how we get the final gradient then and now the whole concept consists of three steps so first we do a forward pass where we apply all the functions and compute the loss then at each node we calculate the local gradients and then we do a so-called backward pass where we compute the gradient of the loss with respect to our weights or parameters using the chain rule so these are the three steps we're gonna do and now we look at a concrete example so here we want to use linear regression and if you don't know how this works then I highly recommend my machine learning from scratch tutorial about linear regression I will put the link in the description so basically we model our output with a linear combination of some weights and an input so our Y hat or Y predicted is W times X and then we formulate some loss function so in this case this is the squared error actually it should be the mean squared error but for simplicity we just use the squared error otherwise you would have another operation to get the mean so the loss is the difference of the predicted Y minus the actual Y and then we square it and now we want to minimize our loss so we want to know the derivative of the loss with respect to our weights and how do we get that so we apply our three steps first we do a forward pass and put in the X and the W and then here we put in the Y and apply our functions here and then we get the loss then we calculate the group the local gradients at each node so here the gradient of the loss with respect to our s then here at the gradient of the s with respect to our Y hat and here at this node the gradient of Y hat with respect to our W and then we do a backward pass so we start at the end and here we have the first we have the derivative of the loss with respect to our s and then we use them and we also use the chain rule to get the derivative of the loss with respect to of the Y hat and then again we use this and the chain rule to get the final grade of the loss with respect to our W so let's do this with some concrete numbers so let's say we have x and y is given so X is 1 and Y is 2 in the beginning and so these are our training samples and we initialize our weight so let's say for example we say our W is 1 in the beginning and then we do the forward pass so here at the first node we multiply X and W so we get Y hat equals 1 then at the next node we do a subtraction so Y hat minus y this one minus 2 equals minus 1 and at the very end so we square our s so we have 1/2 s squared so our loss then is 1 and now we calculate the local gradient so at the last node we have the gradient of the loss with respect to s and this is simple because we know the function so this is the gradient of s squared so this is just 2 s and then at the next node we have the gradient of s with respect to Y hat which is the gradient of the function y hat minus y with respect to Y hat which is just 1 and then here at the last node we have the derivative of Y hat with respect to W so this is the derivative of W times X with respect to W which is X and also notice that we don't need to go don't need to know the derivatives in this graph lines so we don't need to know what is the derivative of s with respect to Y and also here we don't need this because our X and our Y are fixed so we are only interested in our parameters that we want to update here and yeah and then we do the backward pass so first now we use our local gradients so we want to compute the derivative of the loss with respect to y hat and here we use the chain rule with our to local gradients that we just computed which is 2 s times 1 and s is minus 1 which we calculated up here and then so this is minus 2 and now we use this derivative and also this loka gradient to then get the final gradient the gradient of the loss with respect to our W which is the gradient of the loss with respect to y hat times the gradient of Y hat with respect to W which is minus 2 times X and X is 1 so the final gradient is minus 2 so this is the final gradient then that we know want to know and yeah that's all how back propagation works and let's jump over to our code and verify that pi touch get these exact numbers so let's remember X is 1 Y is 2 and W is 1 and then our first gradient should be minus 2 so let's see how we can use this in pi torch and first of all we import torch of course then we create our vector art ends us so we say x equals torch dot tens or and this is 1 and then our y equals torch dot tens or with 2 and then our initial weight is a tensor also with 1 so one point zero to make it a float and here in with our weights we are interested in the gradient so we need to specify require squat equals true and then we do the forward pass and gets and compute the loss so we simply say y hat equals W times X which is our function and then we say loss equals y hat minus the actual Y and then we square this so we say this to the power of two and now let's print our loss and see this is one in the beginning and now we want to do the backward pass so let's do the backward pass and pi touch and we'll compute the local gradients automatically for us and also computes the backward pass automatically for us so the only thing that we have to call is say loss backward so this is the whole gradient computation and now our W has this dot gret attribute and we can print this and now this is the first gradient in the after the first forward and backward pass and remember this should be minus two in the beginning and here we see we have eight enso with minus two so this is working and the next steps would be for example now we update our weights and then we do the next forward and backward pass and do this for a couple of iterations and yeah that's how back propagation works and how and also how easy it is to use it in pi torch and I hope you enjoyed this tutorial please subscribe to the channel and see you next time bye

Original Description

New Tutorial series about Deep Learning with PyTorch! ⭐ Check out Tabnine, the FREE AI-powered code completion tool I use to help me code faster: https://www.tabnine.com/?utm_source=youtube.com&utm_campaign=PythonEngineer * In this part I will explain the famous backpropagation algorithm. I will explain all the necessary concepts and walk you through a concrete example. At the end we will see how easy it is to use backpropagation in PyTorch. - Chain Rule - Computational Graph and local gradients - Forward and backward pass - Concrete example with numbers (Linear Regression) - How to use backpropagation in PyTorch 📚 Get my FREE NumPy Handbook: https://www.python-engineer.com/numpybook 📓 Notebooks available on Patreon: https://www.patreon.com/patrickloeber ⭐ Join Our Discord : https://discord.gg/FHMg9tKFSN Part 04: Backpropagation - Theory With Example If you enjoyed this video, please subscribe to the channel! Official website: https://pytorch.org/ Part 01: https://youtu.be/EMXfZB8FVUA Linear Regression from scratch: https://youtu.be/4swNt7PiamQ Code for this tutorial series: https://github.com/patrickloeber/pytorchTutorial You can find me here: Website: https://www.python-engineer.com Twitter: https://twitter.com/patloeber GitHub: https://github.com/patrickloeber #Python #DeepLearning #Pytorch ---------------------------------------------------------------------------------------------------------- * This is a sponsored link. By clicking on it you will not have any additional costs, instead you will support me and my project. Thank you so much for the support! 🙏
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Patrick Loeber · Patrick Loeber · 38 of 60

1 Lists in Python - Advanced Python 01 - Programming Tutorial
Lists in Python - Advanced Python 01 - Programming Tutorial
Patrick Loeber
2 Tuples in Python - Advanced Python 02 - Programming Tutorial
Tuples in Python - Advanced Python 02 - Programming Tutorial
Patrick Loeber
3 Dictionaries in Python - Advanced Python 03 - Programming Tutorial
Dictionaries in Python - Advanced Python 03 - Programming Tutorial
Patrick Loeber
4 Sets in Python - Advanced Python 04 - Programming Tutorial
Sets in Python - Advanced Python 04 - Programming Tutorial
Patrick Loeber
5 Strings in Python - Advanced Python 05 - Programming Tutorial
Strings in Python - Advanced Python 05 - Programming Tutorial
Patrick Loeber
6 Collections in Python - Advanced Python 06 - Programming Tutorial
Collections in Python - Advanced Python 06 - Programming Tutorial
Patrick Loeber
7 Itertools in Python - Advanced Python 07 - Programming Tutorial
Itertools in Python - Advanced Python 07 - Programming Tutorial
Patrick Loeber
8 Lambda in Python - Advanced Python 08 - Programming Tutorial - Map Filter Reduce
Lambda in Python - Advanced Python 08 - Programming Tutorial - Map Filter Reduce
Patrick Loeber
9 Exceptions in Python - Advanced Python 09 - Programming Tutorial
Exceptions in Python - Advanced Python 09 - Programming Tutorial
Patrick Loeber
10 Logging in Python - Advanced Python 10 - Programming Tutorial
Logging in Python - Advanced Python 10 - Programming Tutorial
Patrick Loeber
11 JSON in Python - Advanced Python 11 - Programming Tutorial
JSON in Python - Advanced Python 11 - Programming Tutorial
Patrick Loeber
12 Random Numbers in Python - Advanced Python 12 - Programming Tutorial
Random Numbers in Python - Advanced Python 12 - Programming Tutorial
Patrick Loeber
13 Decorators in Python - Advanced Python 13 - Programming Tutorial
Decorators in Python - Advanced Python 13 - Programming Tutorial
Patrick Loeber
14 Generators in Python - Advanced Python 14 - Programming Tutorial
Generators in Python - Advanced Python 14 - Programming Tutorial
Patrick Loeber
15 Threading vs Multiprocessing in Python - Advanced Python 15 - Programming Tutorial
Threading vs Multiprocessing in Python - Advanced Python 15 - Programming Tutorial
Patrick Loeber
16 Threading in Python - Advanced Python 16 - Programming Tutorial
Threading in Python - Advanced Python 16 - Programming Tutorial
Patrick Loeber
17 Multiprocessing in Python - Advanced Python 17 - Programming Tutorial
Multiprocessing in Python - Advanced Python 17 - Programming Tutorial
Patrick Loeber
18 Function arguments in detail - Advanced Python 18 - Programming Tutorial
Function arguments in detail - Advanced Python 18 - Programming Tutorial
Patrick Loeber
19 The asterisk (*) operator in Python - Advanced Python 19 - Programming Tutorial
The asterisk (*) operator in Python - Advanced Python 19 - Programming Tutorial
Patrick Loeber
20 Shallow vs Deep Copying in Python - Advanced Python 20 - Programming Tutorial
Shallow vs Deep Copying in Python - Advanced Python 20 - Programming Tutorial
Patrick Loeber
21 Context Managers in Python - Advanced Python 21 - Programming Tutorial
Context Managers in Python - Advanced Python 21 - Programming Tutorial
Patrick Loeber
22 KNN (K Nearest Neighbors) in Python - Machine Learning From Scratch 01 - Python Tutorial
KNN (K Nearest Neighbors) in Python - Machine Learning From Scratch 01 - Python Tutorial
Patrick Loeber
23 Linear Regression in Python - Machine Learning From Scratch 02 - Python Tutorial
Linear Regression in Python - Machine Learning From Scratch 02 - Python Tutorial
Patrick Loeber
24 Logistic Regression in Python - Machine Learning From Scratch 03 - Python Tutorial
Logistic Regression in Python - Machine Learning From Scratch 03 - Python Tutorial
Patrick Loeber
25 Linear and Logistic Regression in 60 lines of Python - Machine Learning From Scratch 04
Linear and Logistic Regression in 60 lines of Python - Machine Learning From Scratch 04
Patrick Loeber
26 Naive Bayes in Python - Machine Learning From Scratch 05 - Python Tutorial
Naive Bayes in Python - Machine Learning From Scratch 05 - Python Tutorial
Patrick Loeber
27 Perceptron in Python - Machine Learning From Scratch 06 - Python Tutorial
Perceptron in Python - Machine Learning From Scratch 06 - Python Tutorial
Patrick Loeber
28 SVM (Support Vector Machine) in Python - Machine Learning From Scratch 07 - Python Tutorial
SVM (Support Vector Machine) in Python - Machine Learning From Scratch 07 - Python Tutorial
Patrick Loeber
29 Decision Tree in Python Part 1/2 - Machine Learning From Scratch 08 - Python Tutorial
Decision Tree in Python Part 1/2 - Machine Learning From Scratch 08 - Python Tutorial
Patrick Loeber
30 Decision Tree in Python Part 2/2 - Machine Learning From Scratch 09 - Python Tutorial
Decision Tree in Python Part 2/2 - Machine Learning From Scratch 09 - Python Tutorial
Patrick Loeber
31 Random Forest in Python - Machine Learning From Scratch 10 - Python Tutorial
Random Forest in Python - Machine Learning From Scratch 10 - Python Tutorial
Patrick Loeber
32 PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial
PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial
Patrick Loeber
33 K-Means Clustering in Python - Machine Learning From Scratch 12 - Python Tutorial
K-Means Clustering in Python - Machine Learning From Scratch 12 - Python Tutorial
Patrick Loeber
34 Anaconda Tutorial - Installation and Basic Commands
Anaconda Tutorial - Installation and Basic Commands
Patrick Loeber
35 PyTorch Tutorial 01 - Installation
PyTorch Tutorial 01 - Installation
Patrick Loeber
36 PyTorch Tutorial 02 - Tensor Basics
PyTorch Tutorial 02 - Tensor Basics
Patrick Loeber
37 PyTorch Tutorial 03 - Gradient Calculation With Autograd
PyTorch Tutorial 03 - Gradient Calculation With Autograd
Patrick Loeber
PyTorch Tutorial 04 - Backpropagation - Theory With Example
PyTorch Tutorial 04 - Backpropagation - Theory With Example
Patrick Loeber
39 PyTorch Tutorial 05 - Gradient Descent with Autograd and Backpropagation
PyTorch Tutorial 05 - Gradient Descent with Autograd and Backpropagation
Patrick Loeber
40 PyTorch Tutorial 06 - Training Pipeline: Model, Loss, and Optimizer
PyTorch Tutorial 06 - Training Pipeline: Model, Loss, and Optimizer
Patrick Loeber
41 PyTorch Tutorial 07 - Linear Regression
PyTorch Tutorial 07 - Linear Regression
Patrick Loeber
42 PyTorch Tutorial 08 - Logistic Regression
PyTorch Tutorial 08 - Logistic Regression
Patrick Loeber
43 PyTorch Tutorial 09 - Dataset and DataLoader - Batch Training
PyTorch Tutorial 09 - Dataset and DataLoader - Batch Training
Patrick Loeber
44 PyTorch Tutorial 10 - Dataset Transforms
PyTorch Tutorial 10 - Dataset Transforms
Patrick Loeber
45 Download Images With Python Automatically - Python Web Scraping Tutorial
Download Images With Python Automatically - Python Web Scraping Tutorial
Patrick Loeber
46 PyTorch Tutorial 11 - Softmax and Cross Entropy
PyTorch Tutorial 11 - Softmax and Cross Entropy
Patrick Loeber
47 Select Movies with Python - Web Scraping Tutorial
Select Movies with Python - Web Scraping Tutorial
Patrick Loeber
48 PyTorch Tutorial 12 - Activation Functions
PyTorch Tutorial 12 - Activation Functions
Patrick Loeber
49 List Comprehension in Python - A Python Feature You MUST KNOW - Python Tutorial
List Comprehension in Python - A Python Feature You MUST KNOW - Python Tutorial
Patrick Loeber
50 PyTorch Tutorial 13 - Feed-Forward Neural Network
PyTorch Tutorial 13 - Feed-Forward Neural Network
Patrick Loeber
51 How To Add A Progress Bar In Python With Just One Line - Python Tutorial
How To Add A Progress Bar In Python With Just One Line - Python Tutorial
Patrick Loeber
52 PyTorch Tutorial 14 - Convolutional Neural Network (CNN)
PyTorch Tutorial 14 - Convolutional Neural Network (CNN)
Patrick Loeber
53 The Walrus Operator - New in Python 3.8 - Python Tutorial
The Walrus Operator - New in Python 3.8 - Python Tutorial
Patrick Loeber
54 PyTorch Tutorial 15 - Transfer Learning
PyTorch Tutorial 15 - Transfer Learning
Patrick Loeber
55 YouTube Data API Tutorial with Python - Analyze Channel Statistics - Part 1
YouTube Data API Tutorial with Python - Analyze Channel Statistics - Part 1
Patrick Loeber
56 YouTube Data API Tutorial with Python - Find Channel Videos - Part 2
YouTube Data API Tutorial with Python - Find Channel Videos - Part 2
Patrick Loeber
57 YouTube Data API Tutorial with Python - Get Video Statistics - Part 3
YouTube Data API Tutorial with Python - Get Video Statistics - Part 3
Patrick Loeber
58 YouTube Data API Tutorial with Python - Analyze the Data - Part 4
YouTube Data API Tutorial with Python - Analyze the Data - Part 4
Patrick Loeber
59 AdaBoost in Python - Machine Learning From Scratch 13 - Python Tutorial
AdaBoost in Python - Machine Learning From Scratch 13 - Python Tutorial
Patrick Loeber
60 Ultimate FREE Study Guide for Machine Learning and Deep Learning
Ultimate FREE Study Guide for Machine Learning and Deep Learning
Patrick Loeber

This video tutorial teaches the fundamentals of backpropagation, including its theory and a practical example using PyTorch. It covers how to compute local gradients, apply the chain rule, and update weights using PyTorch's autograd system. By watching this tutorial, viewers can learn how to implement a supervised learning model using PyTorch and apply backpropagation to compute the gradient of the loss with respect to the weights.

Key Takeaways
  1. Do a forward pass to apply all functions and compute the loss
  2. Calculate local gradients at each node in the computational graph
  3. Do a backward pass to compute the gradient of the loss with respect to weights or parameters using the chain rule
  4. Define the forward pass function y_hat = W * X
  5. Compute the loss loss = (y_hat - actual_y)^2
  6. Call loss.backward() to compute the gradient
  7. Update the weights W using the gradient
💡 The backpropagation algorithm is a key component of supervised learning models, and PyTorch's autograd system provides an efficient way to compute the gradient of the loss with respect to the weights.

Related AI Lessons

Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning
Up next
Image Classification with ml5.js
The Coding Train
Watch →