PyTorch Tutorial 04 - Backpropagation - Theory With Example

Patrick Loeber · Beginner ·🧬 Deep Learning ·6y ago

Skills: ML Maths Basics90%Supervised Learning80%

Key Takeaways

This video tutorial series covers the fundamentals of deep learning with PyTorch, focusing on the backpropagation algorithm, including its theory and a practical example using PyTorch's autograd system. The tutorial demonstrates how to compute local gradients, apply the chain rule, and update weights using PyTorch.

Full Transcript

hi everybody welcome to a new PI torch tutorial in this video I'm going to explain the famous back propagation algorithm and how we can calculate gradients with it I explained the necessary concepts of this technique and then I will walk you through a concrete example with some numbers and at the end we will then see how easy it is to apply back propagation in pi torch so let's start and the first concept we must know is the chain rule so let's say we have two operations or two functions so first we have to input X and then we apply a function a and get an output Y and then we use this output as the input for our second function so the second function B and then we get the final output C and now we want to minimize our C so we want to know the derivative of C with respect to our X and here in the beginning and we can do this using the so-called chain rule so for this we first compute the derivative of C with respect to Y and multiply this with the derivative of Y with respect to X and then we get the final derivative we want so first here we compute the derivative at this position so the derivative of this output with respect to this input and then here the derivative of this output with respect to this input and then we multiply them together and get the final gradient we are interested in so that's the chain rule and now the next concept is the so called computational graph so for every operation we do with our tenth source high-touch will create a graph for us silver at each node we apply one operation or one function with some inputs and then get an output so here at this case in this example we use a multiplication operations so we multiply x and y and then getsy and now at these notes we can calculate so-called local gradients and we can use them later in the chain rule to get the final gradient so here the local gradients we can compute two gradients the gradient of C with respect to X and this is simple since we know this function here so this is the gradient gradient of x times y with respect to X which is y and here in the bottom we compute the derivative of x times y would respect to Y which is X so local gradients are easy because we know this function and why do we want them because typically our graph has more operations and at the very end we calculate a loss function that we want to minimize so we have to calculate the gradient of this loss with respect to our parameter X in the beginning and now let's suppose at this position we already know the derivative of the loss with respect to our C and then we can get the final gradient we want so that with the chain rule so the gradient of the loss with respect to X is then the gradient of loss with respect to C times our local gradient so the derivative of C with respect to X and yeah this is how we get the final gradient then and now the whole concept consists of three steps so first we do a forward pass where we apply all the functions and compute the loss then at each node we calculate the local gradients and then we do a so-called backward pass where we compute the gradient of the loss with respect to our weights or parameters using the chain rule so these are the three steps we're gonna do and now we look at a concrete example so here we want to use linear regression and if you don't know how this works then I highly recommend my machine learning from scratch tutorial about linear regression I will put the link in the description so basically we model our output with a linear combination of some weights and an input so our Y hat or Y predicted is W times X and then we formulate some loss function so in this case this is the squared error actually it should be the mean squared error but for simplicity we just use the squared error otherwise you would have another operation to get the mean so the loss is the difference of the predicted Y minus the actual Y and then we square it and now we want to minimize our loss so we want to know the derivative of the loss with respect to our weights and how do we get that so we apply our three steps first we do a forward pass and put in the X and the W and then here we put in the Y and apply our functions here and then we get the loss then we calculate the group the local gradients at each node so here the gradient of the loss with respect to our s then here at the gradient of the s with respect to our Y hat and here at this node the gradient of Y hat with respect to our W and then we do a backward pass so we start at the end and here we have the first we have the derivative of the loss with respect to our s and then we use them and we also use the chain rule to get the derivative of the loss with respect to of the Y hat and then again we use this and the chain rule to get the final grade of the loss with respect to our W so let's do this with some concrete numbers so let's say we have x and y is given so X is 1 and Y is 2 in the beginning and so these are our training samples and we initialize our weight so let's say for example we say our W is 1 in the beginning and then we do the forward pass so here at the first node we multiply X and W so we get Y hat equals 1 then at the next node we do a subtraction so Y hat minus y this one minus 2 equals minus 1 and at the very end so we square our s so we have 1/2 s squared so our loss then is 1 and now we calculate the local gradient so at the last node we have the gradient of the loss with respect to s and this is simple because we know the function so this is the gradient of s squared so this is just 2 s and then at the next node we have the gradient of s with respect to Y hat which is the gradient of the function y hat minus y with respect to Y hat which is just 1 and then here at the last node we have the derivative of Y hat with respect to W so this is the derivative of W times X with respect to W which is X and also notice that we don't need to go don't need to know the derivatives in this graph lines so we don't need to know what is the derivative of s with respect to Y and also here we don't need this because our X and our Y are fixed so we are only interested in our parameters that we want to update here and yeah and then we do the backward pass so first now we use our local gradients so we want to compute the derivative of the loss with respect to y hat and here we use the chain rule with our to local gradients that we just computed which is 2 s times 1 and s is minus 1 which we calculated up here and then so this is minus 2 and now we use this derivative and also this loka gradient to then get the final gradient the gradient of the loss with respect to our W which is the gradient of the loss with respect to y hat times the gradient of Y hat with respect to W which is minus 2 times X and X is 1 so the final gradient is minus 2 so this is the final gradient then that we know want to know and yeah that's all how back propagation works and let's jump over to our code and verify that pi touch get these exact numbers so let's remember X is 1 Y is 2 and W is 1 and then our first gradient should be minus 2 so let's see how we can use this in pi torch and first of all we import torch of course then we create our vector art ends us so we say x equals torch dot tens or and this is 1 and then our y equals torch dot tens or with 2 and then our initial weight is a tensor also with 1 so one point zero to make it a float and here in with our weights we are interested in the gradient so we need to specify require squat equals true and then we do the forward pass and gets and compute the loss so we simply say y hat equals W times X which is our function and then we say loss equals y hat minus the actual Y and then we square this so we say this to the power of two and now let's print our loss and see this is one in the beginning and now we want to do the backward pass so let's do the backward pass and pi touch and we'll compute the local gradients automatically for us and also computes the backward pass automatically for us so the only thing that we have to call is say loss backward so this is the whole gradient computation and now our W has this dot gret attribute and we can print this and now this is the first gradient in the after the first forward and backward pass and remember this should be minus two in the beginning and here we see we have eight enso with minus two so this is working and the next steps would be for example now we update our weights and then we do the next forward and backward pass and do this for a couple of iterations and yeah that's how back propagation works and how and also how easy it is to use it in pi torch and I hope you enjoyed this tutorial please subscribe to the channel and see you next time bye

Original Description

New Tutorial series about Deep Learning with PyTorch! ⭐ Check out Tabnine, the FREE AI-powered code completion tool I use to help me code faster: https://www.tabnine.com/?utm_source=youtube.com&utm_campaign=PythonEngineer * In this part I will explain the famous backpropagation algorithm. I will explain all the necessary concepts and walk you through a concrete example. At the end we will see how easy it is to use backpropagation in PyTorch. - Chain Rule - Computational Graph and local gradients - Forward and backward pass - Concrete example with numbers (Linear Regression) - How to use backpropagation in PyTorch 📚 Get my FREE NumPy Handbook: https://www.python-engineer.com/numpybook 📓 Notebooks available on Patreon: https://www.patreon.com/patrickloeber ⭐ Join Our Discord : https://discord.gg/FHMg9tKFSN Part 04: Backpropagation - Theory With Example If you enjoyed this video, please subscribe to the channel! Official website: https://pytorch.org/ Part 01: https://youtu.be/EMXfZB8FVUA Linear Regression from scratch: https://youtu.be/4swNt7PiamQ Code for this tutorial series: https://github.com/patrickloeber/pytorchTutorial You can find me here: Website: https://www.python-engineer.com Twitter: https://twitter.com/patloeber GitHub: https://github.com/patrickloeber #Python #DeepLearning #Pytorch ---------------------------------------------------------------------------------------------------------- * This is a sponsored link. By clicking on it you will not have any additional costs, instead you will support me and my project. Thank you so much for the support! 🙏

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Patrick Loeber · Patrick Loeber · 38 of 60

← Previous Next →

Lists in Python - Advanced Python 01 - Programming Tutorial

Lists in Python - Advanced Python 01 - Programming Tutorial

Tuples in Python - Advanced Python 02 - Programming Tutorial

Tuples in Python - Advanced Python 02 - Programming Tutorial

Dictionaries in Python - Advanced Python 03 - Programming Tutorial

Dictionaries in Python - Advanced Python 03 - Programming Tutorial

Sets in Python - Advanced Python 04 - Programming Tutorial

Sets in Python - Advanced Python 04 - Programming Tutorial

Strings in Python - Advanced Python 05 - Programming Tutorial

Strings in Python - Advanced Python 05 - Programming Tutorial

Collections in Python - Advanced Python 06 - Programming Tutorial

Collections in Python - Advanced Python 06 - Programming Tutorial

Itertools in Python - Advanced Python 07 - Programming Tutorial

Itertools in Python - Advanced Python 07 - Programming Tutorial

Lambda in Python - Advanced Python 08 - Programming Tutorial - Map Filter Reduce

Lambda in Python - Advanced Python 08 - Programming Tutorial - Map Filter Reduce

Exceptions in Python - Advanced Python 09 - Programming Tutorial

Exceptions in Python - Advanced Python 09 - Programming Tutorial

Logging in Python - Advanced Python 10 - Programming Tutorial

Logging in Python - Advanced Python 10 - Programming Tutorial

JSON in Python - Advanced Python 11 - Programming Tutorial

JSON in Python - Advanced Python 11 - Programming Tutorial

Random Numbers in Python - Advanced Python 12 - Programming Tutorial

Random Numbers in Python - Advanced Python 12 - Programming Tutorial

Decorators in Python - Advanced Python 13 - Programming Tutorial

Decorators in Python - Advanced Python 13 - Programming Tutorial

Generators in Python - Advanced Python 14 - Programming Tutorial

Generators in Python - Advanced Python 14 - Programming Tutorial

Threading vs Multiprocessing in Python - Advanced Python 15 - Programming Tutorial

Threading vs Multiprocessing in Python - Advanced Python 15 - Programming Tutorial

Threading in Python - Advanced Python 16 - Programming Tutorial

Threading in Python - Advanced Python 16 - Programming Tutorial

Multiprocessing in Python - Advanced Python 17 - Programming Tutorial

Multiprocessing in Python - Advanced Python 17 - Programming Tutorial

Function arguments in detail - Advanced Python 18 - Programming Tutorial

Function arguments in detail - Advanced Python 18 - Programming Tutorial

The asterisk (*) operator in Python - Advanced Python 19 - Programming Tutorial

The asterisk (*) operator in Python - Advanced Python 19 - Programming Tutorial

Shallow vs Deep Copying in Python - Advanced Python 20 - Programming Tutorial

Shallow vs Deep Copying in Python - Advanced Python 20 - Programming Tutorial

Context Managers in Python - Advanced Python 21 - Programming Tutorial

Context Managers in Python - Advanced Python 21 - Programming Tutorial

KNN (K Nearest Neighbors) in Python - Machine Learning From Scratch 01 - Python Tutorial

KNN (K Nearest Neighbors) in Python - Machine Learning From Scratch 01 - Python Tutorial

Linear Regression in Python - Machine Learning From Scratch 02 - Python Tutorial

Linear Regression in Python - Machine Learning From Scratch 02 - Python Tutorial

Logistic Regression in Python - Machine Learning From Scratch 03 - Python Tutorial

Logistic Regression in Python - Machine Learning From Scratch 03 - Python Tutorial

Linear and Logistic Regression in 60 lines of Python - Machine Learning From Scratch 04

Linear and Logistic Regression in 60 lines of Python - Machine Learning From Scratch 04

Naive Bayes in Python - Machine Learning From Scratch 05 - Python Tutorial

Naive Bayes in Python - Machine Learning From Scratch 05 - Python Tutorial

Perceptron in Python - Machine Learning From Scratch 06 - Python Tutorial

Perceptron in Python - Machine Learning From Scratch 06 - Python Tutorial

SVM (Support Vector Machine) in Python - Machine Learning From Scratch 07 - Python Tutorial

SVM (Support Vector Machine) in Python - Machine Learning From Scratch 07 - Python Tutorial

Decision Tree in Python Part 1/2 - Machine Learning From Scratch 08 - Python Tutorial

Decision Tree in Python Part 1/2 - Machine Learning From Scratch 08 - Python Tutorial

Decision Tree in Python Part 2/2 - Machine Learning From Scratch 09 - Python Tutorial

Decision Tree in Python Part 2/2 - Machine Learning From Scratch 09 - Python Tutorial

Random Forest in Python - Machine Learning From Scratch 10 - Python Tutorial

Random Forest in Python - Machine Learning From Scratch 10 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

K-Means Clustering in Python - Machine Learning From Scratch 12 - Python Tutorial

K-Means Clustering in Python - Machine Learning From Scratch 12 - Python Tutorial

Anaconda Tutorial - Installation and Basic Commands

Anaconda Tutorial - Installation and Basic Commands

PyTorch Tutorial 01 - Installation

PyTorch Tutorial 01 - Installation

PyTorch Tutorial 02 - Tensor Basics

PyTorch Tutorial 02 - Tensor Basics

PyTorch Tutorial 03 - Gradient Calculation With Autograd

PyTorch Tutorial 03 - Gradient Calculation With Autograd

PyTorch Tutorial 04 - Backpropagation - Theory With Example

PyTorch Tutorial 04 - Backpropagation - Theory With Example

PyTorch Tutorial 05 - Gradient Descent with Autograd and Backpropagation

PyTorch Tutorial 05 - Gradient Descent with Autograd and Backpropagation

PyTorch Tutorial 06 - Training Pipeline: Model, Loss, and Optimizer

PyTorch Tutorial 06 - Training Pipeline: Model, Loss, and Optimizer

PyTorch Tutorial 07 - Linear Regression

PyTorch Tutorial 07 - Linear Regression

PyTorch Tutorial 08 - Logistic Regression

PyTorch Tutorial 08 - Logistic Regression

PyTorch Tutorial 09 - Dataset and DataLoader - Batch Training

PyTorch Tutorial 09 - Dataset and DataLoader - Batch Training

PyTorch Tutorial 10 - Dataset Transforms

PyTorch Tutorial 10 - Dataset Transforms

Download Images With Python Automatically - Python Web Scraping Tutorial

Download Images With Python Automatically - Python Web Scraping Tutorial

PyTorch Tutorial 11 - Softmax and Cross Entropy

PyTorch Tutorial 11 - Softmax and Cross Entropy

Select Movies with Python - Web Scraping Tutorial

Select Movies with Python - Web Scraping Tutorial

PyTorch Tutorial 12 - Activation Functions

PyTorch Tutorial 12 - Activation Functions

List Comprehension in Python - A Python Feature You MUST KNOW - Python Tutorial

List Comprehension in Python - A Python Feature You MUST KNOW - Python Tutorial

PyTorch Tutorial 13 - Feed-Forward Neural Network

PyTorch Tutorial 13 - Feed-Forward Neural Network

How To Add A Progress Bar In Python With Just One Line - Python Tutorial

How To Add A Progress Bar In Python With Just One Line - Python Tutorial

PyTorch Tutorial 14 - Convolutional Neural Network (CNN)

PyTorch Tutorial 14 - Convolutional Neural Network (CNN)

The Walrus Operator - New in Python 3.8 - Python Tutorial

The Walrus Operator - New in Python 3.8 - Python Tutorial

PyTorch Tutorial 15 - Transfer Learning

PyTorch Tutorial 15 - Transfer Learning

YouTube Data API Tutorial with Python - Analyze Channel Statistics - Part 1

YouTube Data API Tutorial with Python - Analyze Channel Statistics - Part 1

YouTube Data API Tutorial with Python - Find Channel Videos - Part 2

YouTube Data API Tutorial with Python - Find Channel Videos - Part 2

YouTube Data API Tutorial with Python - Get Video Statistics - Part 3

YouTube Data API Tutorial with Python - Get Video Statistics - Part 3

YouTube Data API Tutorial with Python - Analyze the Data - Part 4

YouTube Data API Tutorial with Python - Analyze the Data - Part 4

AdaBoost in Python - Machine Learning From Scratch 13 - Python Tutorial

AdaBoost in Python - Machine Learning From Scratch 13 - Python Tutorial

Ultimate FREE Study Guide for Machine Learning and Deep Learning

Ultimate FREE Study Guide for Machine Learning and Deep Learning

This video tutorial teaches the fundamentals of backpropagation, including its theory and a practical example using PyTorch. It covers how to compute local gradients, apply the chain rule, and update weights using PyTorch's autograd system. By watching this tutorial, viewers can learn how to implement a supervised learning model using PyTorch and apply backpropagation to compute the gradient of the loss with respect to the weights.

Key Takeaways

Do a forward pass to apply all functions and compute the loss
Calculate local gradients at each node in the computational graph
Do a backward pass to compute the gradient of the loss with respect to weights or parameters using the chain rule
Define the forward pass function y_hat = W * X
Compute the loss loss = (y_hat - actual_y)^2
Call loss.backward() to compute the gradient
Update the weights W using the gradient

💡 The backpropagation algorithm is a key component of supervised learning models, and PyTorch's autograd system provides an efficient way to compute the gradient of the loss with respect to the weights.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related AI Lessons

Want to get started with deep learning

Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch

Reddit r/deeplearning

Building a Deepfake Detector From Scratch — What Nobody Tells You

Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media

Medium · Deep Learning

Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…

Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance

Medium · Deep Learning

Implementing Neural Style Transfer from Scratch: The Project That Started It All

Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning

Medium · Deep Learning

Image Classification with ml5.js

The Coding Train