TensorFlow (C2W3L11)
Key Takeaways
This video demonstrates the use of TensorFlow for gradient descent optimization and cost function minimization, covering topics such as TensorFlow program structure, automatic differentiation, and computation graphs.
Full Transcript
welcome to the last video for this week there are many great deep learning programming frameworks one of them is tensorflow I'm excited to the help you start to learn to use tender flow what I want to do in this video is show you the basic structure of a tensor flow program and then leave you to practice and learn more details and practice on yourself in this week's projects designs this week's Pro exercise will take some time to do so please be sure to leave some extra time to do it as the motivating problem let's say that you have some cost function J that you want to minimize and for this example I'm going to use this highly simple cost function J of W equals W squared minus 10 W plus 25 so that's the cost function you might notice that this function is actually W minus 5 squared if you expand out this quadratic together expression above and so the value of W that minimizes this is w equals 5 but let's say we didn't know that and you just have this function let's see how you can implement something intensive low to minimize this because a very similar structure a program can be used to train your network where you can have some complicated cost function J of W B depending on all the parameters of your neural network and then similarly you be able to use tensorflow to automatically try to find values of W and B then minimize this cost function but let's start with the simpler example on the left so I'm running Python in my jupiter notebook and so start-up tender so you import numpy as empty and is idiomatic to use import into flow as TF next let me define the parameter W so intent of flow you're going to use TF dot variable to define a parameter title equals T F dot float 32 and then let's define the cost function so remember the cost function was w squared minus 10 W plus 25 so you just PF dot add some would have W squared plus TF dot multiply so the second term was minus 10 times W and then I'm going to add that to 25 so let me put another GF dot ad over there so that defines the cost J that we had and then I'm going to write train equals T f dot train dot gradient descent optimizer let's use a learning rate of 0.01 and the goal is to minimize the cost and finally the following few lines are quite idiomatic in it equals P f dot global variables initializer and then on session equals T F dot the profession from starter sensical session session you got to run a lit to initialize a global variables and then for tend to filter you value the variable we're going to use set start run W we haven't done anything yet so with this line above initialize W to zero and define a cost function will define train to be our learning algorithm which uses a gradient descent optimizer to minimize the cost function but we haven't actually run the learning algorithm yet so sessions are run we evaluate W and then we print session run so if you run that evaluate W to be equal to zero because you haven't done anything yet now let's do sessions are run on train so what this will do is run one step of gradient descent and then let's evaluate the value of W after one step of gradient descent and print that so we do that after one step agreeing to send W is now zero point one let's now run a thousand iterations of gradient descent so run train and let's then print session that and run W so this is run a thousand iterations of grande descent and at the end W ends up being four point nine nine nine nine remember we said that we're minimizing W minus five squared so the optimal value of W is five and got very close to this so hope this gives you a sense of the broad structure of a tensor flow program and as you do therefore we exercise and play with more tensorflow close yourself some of these functions that I'm using here will become more familiar some things to notice about this w is the parameter we're trying to optimize so we're going to declare that as a variable and notice that all we had to do was define a cost function using these add and multiply and so on functions and tend to throws automatically how to take derivatives respect to the add and multiply as well as other functions which is why you only have to implement basically forward prop and it can figure out how to do the back problem of the gradient computation because that's already built in to the add and multiply as well as the squaring functions by the way in case this notation seems really ugly since the flow actually has overloaded the computation for the usual plus minus and so on so you can also just write this nicer format so it cost to comment that out and if you run this and get the same result so once W is declared to be attentive so variable these squaring multiplication adding and subtraction operations are over though this you don't need to use this a griffon check had above now there's just one more feature of ten to fill that I want to show you which is this example minimize a fixed function of W one of the function you want to minimize is a function of your training set so whatever you have some training data X and when you train your neural network the training data X can change so how do you get training data into a 10-2 phone program so I'm going to find key X which is think of this as playing a role of a training data or really the training data with both x and y but we only get X in this example so there's going to define exterior placeholder and it's going to be of type float 32 and let's make those a three by one array and what I'm going to do is whereas the cost here has fixed coefficients in front of the V terms in this quadratic use one times W squared minus ten times W plus 25 we could turn these numbers 1 minus 10 and 25 into data so what I'm going to do is replace the cost with cost equals x 0 0 times W squared plus x10 times W plus x2 0 o times 1 so now X becomes sort of like data that controls the coefficients of this quadratic function and this placeholder function tells tensorflow that X is something that you provide the values for the retailer so let's define another arrays coefficient equals NP dot array 1 - 10 and yes the last value is 25 so that's going to be the data that we're going to plug into X so finally we need a way to get this a very coefficients into the variable X and the syntax to do that is doing the training step that the value 4 will need to be provided for X so I'm going to set here 6 equals x that's through coefficients and I'm going to change this in a copy and paste and put that here as well all right hopefully I didn't have any syntax errors let's try to be running this and we get the same results hopefully as before and now if you want to change the coefficients of this quadratic function let's say you take this 10 and change it to 20 minus 20 and let's change this to 100 so this is now the function X minus 10 squared and if I rerun this hopefully I find that the value that minimizes X minus 10 squared this w equals 10 let's see cool great we got W very close to 10 after running a thousand iterations of gradient descent so what you see more of when you do the exercise is that a placeholder in terms of flow is a variable whose value you assign later and this is a convenient way to get your training data into the cost function and the way you get your data into the cost function is with this syntax on when you're running a training iteration to use the feet dip to set X to be equal to the coefficients here and if you're doing mini-batch gradient descent where on each iteration you need to plug in a different mini batch then on different iterations you use the feet thick to feed in different subsets of your training set different meaning into where your cost function is expecting to see data so hopefully this gives you a sense of what tens of so can do and the thing that makes it so powerful is all you need to do is specify how to compute the cost function and then it takes derivatives and it can apply a gradient optimizer or an atom optimizer or some other optimizer with just you know pretty much one or two lines of code so here's the code again I've cleaned this up just a little bit and in case some of these functions or variables seem a little bit mysterious to you still they will become more familiar after you practice with it a couple times by working through their programming exercise just one last thing I want to mention these three lines of code are quite idiomatic in terms of flow and what some program is will do is use this alternative format which basically does the same thing set session to TF dot session to start the session and use the session to run in it and then use the session to evaluate CW and in print of result but this with construction is used in a number of tens of flow programs as well it more or less means the same thing as the thing on the left but the words command in Python is a little bit better and cleaning up in cases an error on exception what we're accusing this in a loop so you see this in there from an exercise as well so what is this code really doing let's focus on this equation the heart of a tentacle program is something to compute a cost and then ten to flow automatically figures out the derivatives and how to minimize that cost so what this equation or what does some line of code is doing is its allowing tender flow to construct a computation graph and a computation drought does the following it takes X 0 0 it takes W and then I guess W gets squared and then X 0 0 gets multiplied with W squared you have X zero zero times W squared and so on right and eventually you know this gets built up to compute this xw x zero zero times W squared plus x10 times W plus and so on and so eventually you get your the cost function right now against the last term to be added would be a x2 0 where it gets added to be the cost I won't write the other form under the cost and and the nice thing about center flow is that by implementing maybe forward propagation through this computation graph the computed cost tens of flow has already back built in all the necessary backward functions so remember how training a thief new network has a set of forward functions instead of backward functions and programming frameworks Blake tensorflow have already built in the necessary backward functions which is why by using the built-in functions to compute the forward function it can automatically do the backward functions as well to implement back propagation through even very complicated functions and compute derivatives for you so that's why you don't need to explicitly implement back prop and this is one of the things that make the proving framework help you become really efficient if you look at the terms of so documentation I just don't point out that the tentacled documentation uses a slightly different notation then I did for drawing the computation graph that uses X 0 0 W and then rather than writing the value like W squared the tension-filled documentation tends to just write the operations so this would be a square operation and these two get combined in a multiplication operation and so on and then the final note I get there'd be a addition operation when you add X to 0 to find a final value so for the purposes of this clause I thought that this notation for the compensation drop would be easier for you to understand but if you look at the tensorflow documentation as we look at the computation graphs in the documentation you see this alternative convention where the nodes are labeled with the operations rather than with the value but both of these representations you represent basically the same computation graph and a lot of things they can do with just one line of code in programming frameworks for example if you don't want to use gradient descent but instead you want to use the atom optimizer by changing this line of code you can very quickly swap it swap in a better optimization algorithm so all the modern deep learning program frameworks support things like this and makes it really easy for you to code up even pretty complex neural networks so I hope this was helpful for giving you a sense of the typical structure of a tensor field program to recap the material from this week you saw how to systematically organize the hyper parameter search process we also talked about batch normalization and how you can use that to speed up training of your networks and finally we talked about programming frameworks so deep learning there are many great program frameworks and we had this last video focusing on tens of so with that I hope you enjoyed this week's pro exercise and that helps you gain even more familiarity with these ideas
Original Description
Take the Deep Learning Specialization: http://bit.ly/38u7YIW
Check out all our courses: https://www.deeplearning.ai
Subscribe to The Batch, our weekly newsletter: https://www.deeplearning.ai/thebatch
Follow us:
Twitter: https://twitter.com/deeplearningai_
Facebook: https://www.facebook.com/deeplearningHQ/
Linkedin: https://www.linkedin.com/company/deeplearningai
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from DeepLearningAI · DeepLearningAI · 34 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
▶
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Forward and Backward Propagation (C1W4L06)
DeepLearningAI
deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin
DeepLearningAI
deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov
DeepLearningAI
deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio
DeepLearningAI
deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel
DeepLearningAI
deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow
DeepLearningAI
deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy
DeepLearningAI
Using an Appropriate Scale (C2W3L02)
DeepLearningAI
Gradient Checking (C2W1L13)
DeepLearningAI
Gradient Checking Implementation Notes (C2W1L14)
DeepLearningAI
Learning Rate Decay (C2W2L09)
DeepLearningAI
Understanding Mini-Batch Gradient Dexcent (C2W2L02)
DeepLearningAI
Mini Batch Gradient Descent (C2W2L01)
DeepLearningAI
The Problem of Local Optima (C2W3L10)
DeepLearningAI
Exponentially Weighted Averages (C2W2L03)
DeepLearningAI
Tuning Process (C2W3L01)
DeepLearningAI
Understanding Exponentially Weighted Averages (C2W2L04)
DeepLearningAI
Bias Correction of Exponentially Weighted Averages (C2W2L05)
DeepLearningAI
Gradient Descent With Momentum (C2W2L06)
DeepLearningAI
Normalizing Activations in a Network (C2W3L04)
DeepLearningAI
Hyperparameter Tuning in Practice (C2W3L03)
DeepLearningAI
Adam Optimization Algorithm (C2W2L08)
DeepLearningAI
RMSProp (C2W2L07)
DeepLearningAI
Fitting Batch Norm Into Neural Networks (C2W3L05)
DeepLearningAI
Why Does Batch Norm Work? (C2W3L06)
DeepLearningAI
Batch Norm At Test Time (C2W3L07)
DeepLearningAI
Softmax Regression (C2W3L08)
DeepLearningAI
Deep Learning Frameworks (C2W3L10)
DeepLearningAI
Neural Network Overview (C1W3L01)
DeepLearningAI
Training Softmax Classifier (C2W3L09)
DeepLearningAI
Why Deep Representations? (C1W4L04)
DeepLearningAI
Gradient Descent For Neural Networks (C1W3L09)
DeepLearningAI
Neural Network Representations (C1W3L02)
DeepLearningAI
TensorFlow (C2W3L11)
DeepLearningAI
Activation Functions (C1W3L06)
DeepLearningAI
Explanation For Vectorized Implementation (C1W3L05)
DeepLearningAI
Getting Matrix Dimensions Right (C1W4L03)
DeepLearningAI
Understanding Dropout (C2W1L07)
DeepLearningAI
Building Blocks of a Deep Neural Network (C1W4L05)
DeepLearningAI
Why Non-linear Activation Functions (C1W3L07)
DeepLearningAI
Computing Neural Network Output (C1W3L03)
DeepLearningAI
Backpropagation Intuition (C1W3L10)
DeepLearningAI
Train/Dev/Test Sets (C2W1L01)
DeepLearningAI
Deep L-Layer Neural Network (C1W4L01)
DeepLearningAI
Random Initialization (C1W3L11)
DeepLearningAI
Other Regularization Methods (C2W1L08)
DeepLearningAI
Normalizing Inputs (C2W1L09)
DeepLearningAI
Derivatives Of Activation Functions (C1W3L08)
DeepLearningAI
Parameters vs Hyperparameters (C1W4L07)
DeepLearningAI
Vectorizing Across Multiple Examples (C1W3L04)
DeepLearningAI
What does this have to do with the brain? (C1W4L08)
DeepLearningAI
Dropout Regularization (C2W1L06)
DeepLearningAI
Vanishing/Exploding Gradients (C2W1L10)
DeepLearningAI
Basic Recipe for Machine Learning (C2W1L03)
DeepLearningAI
Bias/Variance (C2W1L02)
DeepLearningAI
Forward Propagation in a Deep Network (C1W4L02)
DeepLearningAI
Weight Initialization in a Deep Network (C2W1L11)
DeepLearningAI
Numerical Approximations of Gradients (C2W1L12)
DeepLearningAI
Regularization (C2W1L04)
DeepLearningAI
Why Regularization Reduces Overfitting (C2W1L05)
DeepLearningAI
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Want to get started with deep learning
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Medium · Deep Learning
🎓
Tutor Explanation
DeepCamp AI