Visualizing the chain rule and product rule | Chapter 4, Essence of calculus
Skills:
ML Maths Basics90%
Key Takeaways
The video explains the chain rule and product rule in calculus, providing a visual representation of these concepts and their applications in finding derivatives of functions, with a focus on the mathematical foundations essential for machine learning.
Full Transcript
[Music] in the last videos I talked about the derivatives of simple functions and the goal was to have a clear picture or intuition to hold in your mind that actually explains where these formulas come from but of course most of the functions you deal with in modeling the world involve somehow mixing or combining or tweaking these simple functions in some other way so our natural next step is to understand how you take derivatives of more complicated combinations and again I don't want these to be something to memorize I want you to have a clear picture in mind for where each one comes from now this really boils down into three basic ways to combine functions you can add them together you can multiply them and you can throw one inside the other known as composing them sure you could say subtracting them but really that's just multiplying the second by ne1 and adding them together and likewise dividing functions doesn't really add anything because that's the same as plugging one inside the function 1 /x and then multiplying the two together so really most functions you come across just involve layering together these three different types of combinations though there's not really abound on how monstrous things can become but as long as you know how derivatives play with just those three combination types you'll always be able to just take it step by step and peel through the layers for any kind of monstrous expression so the question is if you know the derivative of two functions what is the derivative of their sum of their product and of the function composition between them the sum rule is easiest if somewhat tongue twisting to say out loud the derivative of a sum of two functions is the sum of their derivatives but it's worth warming up with this example by really thinking through what it means to take a derivative of a sum of two functions since the derivative patterns for products and for function composition won't be so straightforward and they're going to require this kind of deeper thinking for example let's think about this function f ofx = sin of x + x^2 it's a function where for every input you add together the values of sin of X and X2 at that point for example let's say at x equals 0.5 the height of the sign graph is given by this vertical bar and the height of the X squ Parabola is given by this slightly smaller vertical bar and their sum is the length you get by just stacking them together now for the derivative you want to ask what happens as you nudge that input slightly maybe increasing it up to 0.5 plus DX the difference in the value of f between between those two places is what we call DF and when you picture it like this I think you'll agree that the total change in the height is whatever the change to the sign graph is what we might call D sin of X plus whatever the change to x^2 is dx2 now we know that the derivative of s is cosine and remember what that means it means that this little change d X is about cosine of x * DX it's proportional to the size of our initial nudge DX and the proportionality constant equals cosine of whatever input we happen to start at likewise because the derivative of x^2 is 2x the change in the height of the x^2 graph is going to be about 2 * x * whatever DX was so rearranging DF / DX the ratio of the tiny change to this sum function to the tiny change in X that caused it is indeed cosine of x + 2x the sum of the derivatives of its parts but like I said things are a bit different for products and let's think through why and let's think through why in terms of tiny nudges again in this case I don't think graphs are our best bet for visualizing things pretty commonly in math at a lot of levels of math really if you're dealing with a product of two things it helps to understand it as some kind of area in this case maybe you try to configure some mental setup of a box where the side lengths are sign of X and x^2 but what would that mean well since these are functions you might think of those sides as adjustable dependent on the value of x which maybe you think of as this number that you can just freely adjust up and down so getting a feel for what this means focus on that top side there who changes as the function sign of X as you change this value of x up from zero it increases up to a length of one as s of X moves up towards its peak and after that it starts to decrease as s of X comes down from one and in the same way that height there is always changing as X2 so F ofx defined as the product of these two functions is going to be the area of this box and for the derivative let's think about how a tiny change to X by DX influences that area what is that resulting change in area DF well the nudge DX caused that width to change by some small D sin of X and it caused that height to change by su dx2 and this gives us three little Snippets of new area a thin rectangle on the bottom whose area is its width s of x times its thin height d x^2 and there's this thin rectangle on the right whose area is its height x^2 time its thin little width D sin of X and there's also this little bit in the corner but we can ignore that its area is ultimately going to be proportional to dx^ 2 and as we've seen before that becomes negligible as DX goes to zero I mean this whole setup is very similar to what I showed last video with the x s diagram and just like then keep in mind that I'm using somewhat beefy changes here to draw things just so that we can actually see them but in principle DX is something very very small and that means that dx^ 2 and D sin of X are also very very small so applying what we know about the derivative of s and of x^2 that tiny change dx^ 2 is going to be about 2x * DX and that tiny change D sin of X well that's going to be about cosine of x * DX as usual we divide out by that DX to see that the ratio we want DF / DX is sin of x * the derivative of x^2 plus x^2 * the derivative of s and nothing we've done here is specific to s or to x^2 this same line of reasoning would work for any two functions G and H and sometimes people like to remember this pattern with a certain pneumonic that you kind of sing in your head left d right right d left in this example where we have sin of x * x^2 left d right means you take that left function s of x times the derivative of the right in this case 2x then you add on right d left that right function x^2 times the derivative of the left one cosine of x now out of context presented as a rule to remember I think this would feel pretty strange don't you but when you actually think of this adjustable box you can see what each of those terms represents Lefty right is the area of that little bottom rectangle and right d left is the area of that rectangle on the side by the way I should mention that if you multiply by a constant say 2 * sin of X things end up a lot simpler the derivative is just the same as the constant multiplied by the derivative of the function in this case 2 * cosine of x I'll leave it to you to pause and Ponder and just kind of verify that that makes sense aside from addition and multiplication the other common way to combine functions and believe me this one comes up all the time is to shove one inside the other function composition for example maybe we take the function x^2 and we just shove it on inside sin of x to get this new function sin of x^2 what do you think the derivative of that new function is to think this one through I'm going to choose yet another way to visualize things just to emphasize that in Creative math we've got lots of options I'll put up three different number lines the top one is going to hold the value of x the second one is going to hold the value of x squ and that third line is going to hold the value of s of x^2 that is the function x^2 gets you from Line 1 to line 2 and the function sign gets you from line two to line 3 as I shift around this value of x maybe moving it up to the value three that second value stays pegged to whatever X2 is in this case moving up to 9 and that bottom value being s of x^2 is going to go to whatever s of 9 happens to be so for the derivative let's again start by just nudging that x value by some little DX and I always think that it's helpful to think of X as starting at some actual concrete number maybe 1.5 in this case the resulting nudge to that second value the change in X2 caused by such a DX is DX s and we could expand this like we have before as 2x * DX which for our specific input would be 2 * 1.5 * DX but it actually helps to keep things written as dx^ 2 at least for now and in fact I'm going to go one step further I'm going to give a new name to this x squ maybe H so that instead of writing dx^ s for this nudge we write DH and this makes it easier to think about that third value which is now pegged at s of H its change is D sin of H the tiny change caused by the nudge DH and by the way the fact that it's moving to the left while the DH bump is going to the right that just means that this change D sign of H is going to be some kind of negative number and once again we can use our knowledge of the derivative of the sign this D sin of H is going to be about cosine of H * DH that's what it means for the derivative of s to be cosine and unfolding things we can just replace that H with x^2 again so we know that that bottom nudge is going to have a size of cosine of x^2 * DX s and in fact let's unfold things even further that intermediate nudge dx^ 2 is going to be about 2x * DX and it's always a good habit to remind yourself of what an expression like this actually means in this case where we started at X = 1.5 up top this whole expression is telling us that the size of the nudge on that third line is going to be about cosine of 1.5 2 * 2 * 1.5 times whatever the size of DX was it's proportional to the size of DX and this derivative is giving us that proportionality constant notice what we came out with here we have the derivative of the outside function and it's still taking in the unaltered inside function and then we're multiplying it by the derivative of that inside function again there is nothing special about s of X or x^2 if you have any two functions G ofx and H ofx the derivative of their composition G of H ofx is going to be the derivative of G evaluated on H multiplied by the derivative of H this pattern right here is what we usually call the chain rule notice for the derivative of G I'm writing it as DG DH instead of DG DX on the symbolic level this is a reminder that the thing you plug into that derivative is still going to be that intermediary function H but more than that it's an important reflection of what this derivative of the outer function actually represents remember in our three line setup when we took the derivative of the sign on that bottom we expanded the size of that nudge D sin as cosine of H * DH this was because we didn't immediately know how the size of that bottom nudge depended on X that's kind of the whole thing we were trying to figure out but we could take the derivative with respect to that intermediate variable H that is figure out how to express the size of that nudge on the third line as some multiple of DH the size of the nudge on the second line and it was only after that that we unfolded further by figuring out what DH was so in this chain rule expression we're saying look at the ratio between a tiny change in G the final output to a tiny change in h that caused it h being the value that we plug into G then multiply that by the tiny change in h divided by the tiny change in X that caused it so notice those dh's cancel out and they give us a ratio between the change in that final output and the change to the input that through a certain chain of events brought it about and that cancellation of DH is not just a notational trick that is a genuine reflection of what's going on with the tiny nudges that underpin everything we do with derivatives so those are the three basic tools to have in your belt to handle derivatives of functions that combine a lot of smaller things you've got the sum rule the product rule and the chain Rule and I'll be honest with you there is a big difference between knowing what the chain rule is and what the product rule is and actually being fluent with applying them in even the most hairy of situations watching videos any videos about the mechanics of calculus is never going to substitute for practicing those mechanics yourself and building up the muscles to do these computations yourself I really wish that I could offer to do that for you but I'm afraid the ball is in your court my friend to seek out the practice what I can offer and what I hope I have offered is to show you where these rules actually come from to show that they're not just something to be memorized and hammered away but they're natural patterns things that you too could have discovered just by patiently thinking through what a derivative actually means [Music]
Original Description
A visual explanation of what the chain rule and product rule are, and why they are true.
Help fund future projects: https://www.patreon.com/3blue1brown
This video was sponsored by Brilliant: https://brilliant.org/3b1b
An equally valuable form of support is to simply share some of the videos.
Special thanks to these supporters: http://3b1b.co/lessons/chain-rule-and-product-rule#thanks
Home page: https://www.3blue1brown.com
Series like this one are funded largely by the community, through Patreon, where supporters get early access as the series is being produced.
http://3b1b.co/support
Timestamps:
0:00 - Intro
1:48 - Sum rule
4:13 - Product rule
8:41 - Chain rule
14:36 - Outro
Thanks to these viewers for their contributions to translations
Hebrew: Omer Tuchfeld
Italian: adilatte, ang
Vietnamese: ngvutuan2811
------------------
3blue1brown is a channel about animating math, in all senses of the word animate. And you know the drill with YouTube, if you want to stay posted about new videos, subscribe, and click the bell to receive notifications (if you're into that).
If you are new to this channel and want to see more, a good place to start is this playlist: http://3b1b.co/recommended
Various social media stuffs:
Website: https://www.3blue1brown.com
Twitter: https://twitter.com/3Blue1Brown
Patreon: https://patreon.com/3blue1brown
Facebook: https://www.facebook.com/3blue1brown
Reddit: https://www.reddit.com/r/3Blue1Brown
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from 3Blue1Brown · 3Blue1Brown · 37 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
▶
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
e to the pi i, a nontraditional take (old version)
3Blue1Brown
Euler's Formula Poem
3Blue1Brown
Euler's Formula and Graph Duality
3Blue1Brown
What does it feel like to invent math?
3Blue1Brown
How to count to 1000 on two hands
3Blue1Brown
Music And Measure Theory
3Blue1Brown
Fractal charm: Space filling curves
3Blue1Brown
The Brachistochrone, with Steven Strogatz
3Blue1Brown
Snell's law proof using springs
3Blue1Brown
Triangle of Power
3Blue1Brown
Essence of linear algebra preview
3Blue1Brown
Vectors | Chapter 1, Essence of linear algebra
3Blue1Brown
Linear combinations, span, and basis vectors | Chapter 2, Essence of linear algebra
3Blue1Brown
Linear transformations and matrices | Chapter 3, Essence of linear algebra
3Blue1Brown
Matrix multiplication as composition | Chapter 4, Essence of linear algebra
3Blue1Brown
Three-dimensional linear transformations | Chapter 5, Essence of linear algebra
3Blue1Brown
The determinant | Chapter 6, Essence of linear algebra
3Blue1Brown
Inverse matrices, column space and null space | Chapter 7, Essence of linear algebra
3Blue1Brown
Nonsquare matrices as transformations between dimensions | Chapter 8, Essence of linear algebra
3Blue1Brown
Dot products and duality | Chapter 9, Essence of linear algebra
3Blue1Brown
Cross products in the light of linear transformations | Chapter 11, Essence of linear algebra
3Blue1Brown
Cross products | Chapter 10, Essence of linear algebra
3Blue1Brown
Change of basis | Chapter 13, Essence of linear algebra
3Blue1Brown
Eigenvectors and eigenvalues | Chapter 14, Essence of linear algebra
3Blue1Brown
Abstract vector spaces | Chapter 16, Essence of linear algebra
3Blue1Brown
Who cares about topology? (Old version)
3Blue1Brown
3blue1brown channel trailer
3Blue1Brown
Binary, Hanoi and Sierpinski, part 1
3Blue1Brown
Binary, Hanoi, and Sierpinski, part 2
3Blue1Brown
But what is the Riemann zeta function? Visualizing analytic continuation
3Blue1Brown
Tattoos on Math
3Blue1Brown
Fractals are typically not self-similar
3Blue1Brown
Euler's formula with introductory group theory
3Blue1Brown
The essence of calculus
3Blue1Brown
The paradox of the derivative | Chapter 2, Essence of calculus
3Blue1Brown
Derivative formulas through geometry | Chapter 3, Essence of calculus
3Blue1Brown
Visualizing the chain rule and product rule | Chapter 4, Essence of calculus
3Blue1Brown
What's so special about Euler's number e? | Chapter 5, Essence of calculus
3Blue1Brown
Implicit differentiation, what's going on here? | Chapter 6, Essence of calculus
3Blue1Brown
Limits, L'Hôpital's rule, and epsilon delta definitions | Chapter 7, Essence of calculus
3Blue1Brown
Integration and the fundamental theorem of calculus | Chapter 8, Essence of calculus
3Blue1Brown
What does area have to do with slope? | Chapter 9, Essence of calculus
3Blue1Brown
Higher order derivatives | Chapter 10, Essence of calculus
3Blue1Brown
Taylor series | Chapter 11, Essence of calculus
3Blue1Brown
Pi hiding in prime regularities
3Blue1Brown
All possible pythagorean triples, visualized
3Blue1Brown
But how does bitcoin actually work?
3Blue1Brown
How secure is 256 bit security?
3Blue1Brown
Hilbert's Curve: Is infinite math useful?
3Blue1Brown
Thinking outside the 10-dimensional box
3Blue1Brown
Some light quantum mechanics (with minutephysics)
3Blue1Brown
But what is a neural network? | Deep learning chapter 1
3Blue1Brown
Gradient descent, how neural networks learn | Deep Learning Chapter 2
3Blue1Brown
Backpropagation, intuitively | Deep Learning Chapter 3
3Blue1Brown
Backpropagation calculus | Deep Learning Chapter 4
3Blue1Brown
The hardest problem on the hardest test
3Blue1Brown
Q&A #2 + Net Neutrality Nuance
3Blue1Brown
Why this puzzle is impossible
3Blue1Brown
But what is the Fourier Transform? A visual introduction.
3Blue1Brown
The more general uncertainty principle, regarding Fourier transforms
3Blue1Brown
More on: ML Maths Basics
View skill →Related AI Lessons
Chapters (5)
Intro
1:48
Sum rule
4:13
Product rule
8:41
Chain rule
14:36
Outro
🎓
Tutor Explanation
DeepCamp AI