TensorFlow (C2W3L11)

DeepLearningAI · Beginner ·🧬 Deep Learning ·8y ago

Skills: ML Maths Basics90%Supervised Learning80%ML Pipelines80%

Key Takeaways

This video demonstrates the use of TensorFlow for gradient descent optimization and cost function minimization, covering topics such as TensorFlow program structure, automatic differentiation, and computation graphs.

Full Transcript

welcome to the last video for this week there are many great deep learning programming frameworks one of them is tensorflow I'm excited to the help you start to learn to use tender flow what I want to do in this video is show you the basic structure of a tensor flow program and then leave you to practice and learn more details and practice on yourself in this week's projects designs this week's Pro exercise will take some time to do so please be sure to leave some extra time to do it as the motivating problem let's say that you have some cost function J that you want to minimize and for this example I'm going to use this highly simple cost function J of W equals W squared minus 10 W plus 25 so that's the cost function you might notice that this function is actually W minus 5 squared if you expand out this quadratic together expression above and so the value of W that minimizes this is w equals 5 but let's say we didn't know that and you just have this function let's see how you can implement something intensive low to minimize this because a very similar structure a program can be used to train your network where you can have some complicated cost function J of W B depending on all the parameters of your neural network and then similarly you be able to use tensorflow to automatically try to find values of W and B then minimize this cost function but let's start with the simpler example on the left so I'm running Python in my jupiter notebook and so start-up tender so you import numpy as empty and is idiomatic to use import into flow as TF next let me define the parameter W so intent of flow you're going to use TF dot variable to define a parameter title equals T F dot float 32 and then let's define the cost function so remember the cost function was w squared minus 10 W plus 25 so you just PF dot add some would have W squared plus TF dot multiply so the second term was minus 10 times W and then I'm going to add that to 25 so let me put another GF dot ad over there so that defines the cost J that we had and then I'm going to write train equals T f dot train dot gradient descent optimizer let's use a learning rate of 0.01 and the goal is to minimize the cost and finally the following few lines are quite idiomatic in it equals P f dot global variables initializer and then on session equals T F dot the profession from starter sensical session session you got to run a lit to initialize a global variables and then for tend to filter you value the variable we're going to use set start run W we haven't done anything yet so with this line above initialize W to zero and define a cost function will define train to be our learning algorithm which uses a gradient descent optimizer to minimize the cost function but we haven't actually run the learning algorithm yet so sessions are run we evaluate W and then we print session run so if you run that evaluate W to be equal to zero because you haven't done anything yet now let's do sessions are run on train so what this will do is run one step of gradient descent and then let's evaluate the value of W after one step of gradient descent and print that so we do that after one step agreeing to send W is now zero point one let's now run a thousand iterations of gradient descent so run train and let's then print session that and run W so this is run a thousand iterations of grande descent and at the end W ends up being four point nine nine nine nine remember we said that we're minimizing W minus five squared so the optimal value of W is five and got very close to this so hope this gives you a sense of the broad structure of a tensor flow program and as you do therefore we exercise and play with more tensorflow close yourself some of these functions that I'm using here will become more familiar some things to notice about this w is the parameter we're trying to optimize so we're going to declare that as a variable and notice that all we had to do was define a cost function using these add and multiply and so on functions and tend to throws automatically how to take derivatives respect to the add and multiply as well as other functions which is why you only have to implement basically forward prop and it can figure out how to do the back problem of the gradient computation because that's already built in to the add and multiply as well as the squaring functions by the way in case this notation seems really ugly since the flow actually has overloaded the computation for the usual plus minus and so on so you can also just write this nicer format so it cost to comment that out and if you run this and get the same result so once W is declared to be attentive so variable these squaring multiplication adding and subtraction operations are over though this you don't need to use this a griffon check had above now there's just one more feature of ten to fill that I want to show you which is this example minimize a fixed function of W one of the function you want to minimize is a function of your training set so whatever you have some training data X and when you train your neural network the training data X can change so how do you get training data into a 10-2 phone program so I'm going to find key X which is think of this as playing a role of a training data or really the training data with both x and y but we only get X in this example so there's going to define exterior placeholder and it's going to be of type float 32 and let's make those a three by one array and what I'm going to do is whereas the cost here has fixed coefficients in front of the V terms in this quadratic use one times W squared minus ten times W plus 25 we could turn these numbers 1 minus 10 and 25 into data so what I'm going to do is replace the cost with cost equals x 0 0 times W squared plus x10 times W plus x2 0 o times 1 so now X becomes sort of like data that controls the coefficients of this quadratic function and this placeholder function tells tensorflow that X is something that you provide the values for the retailer so let's define another arrays coefficient equals NP dot array 1 - 10 and yes the last value is 25 so that's going to be the data that we're going to plug into X so finally we need a way to get this a very coefficients into the variable X and the syntax to do that is doing the training step that the value 4 will need to be provided for X so I'm going to set here 6 equals x that's through coefficients and I'm going to change this in a copy and paste and put that here as well all right hopefully I didn't have any syntax errors let's try to be running this and we get the same results hopefully as before and now if you want to change the coefficients of this quadratic function let's say you take this 10 and change it to 20 minus 20 and let's change this to 100 so this is now the function X minus 10 squared and if I rerun this hopefully I find that the value that minimizes X minus 10 squared this w equals 10 let's see cool great we got W very close to 10 after running a thousand iterations of gradient descent so what you see more of when you do the exercise is that a placeholder in terms of flow is a variable whose value you assign later and this is a convenient way to get your training data into the cost function and the way you get your data into the cost function is with this syntax on when you're running a training iteration to use the feet dip to set X to be equal to the coefficients here and if you're doing mini-batch gradient descent where on each iteration you need to plug in a different mini batch then on different iterations you use the feet thick to feed in different subsets of your training set different meaning into where your cost function is expecting to see data so hopefully this gives you a sense of what tens of so can do and the thing that makes it so powerful is all you need to do is specify how to compute the cost function and then it takes derivatives and it can apply a gradient optimizer or an atom optimizer or some other optimizer with just you know pretty much one or two lines of code so here's the code again I've cleaned this up just a little bit and in case some of these functions or variables seem a little bit mysterious to you still they will become more familiar after you practice with it a couple times by working through their programming exercise just one last thing I want to mention these three lines of code are quite idiomatic in terms of flow and what some program is will do is use this alternative format which basically does the same thing set session to TF dot session to start the session and use the session to run in it and then use the session to evaluate CW and in print of result but this with construction is used in a number of tens of flow programs as well it more or less means the same thing as the thing on the left but the words command in Python is a little bit better and cleaning up in cases an error on exception what we're accusing this in a loop so you see this in there from an exercise as well so what is this code really doing let's focus on this equation the heart of a tentacle program is something to compute a cost and then ten to flow automatically figures out the derivatives and how to minimize that cost so what this equation or what does some line of code is doing is its allowing tender flow to construct a computation graph and a computation drought does the following it takes X 0 0 it takes W and then I guess W gets squared and then X 0 0 gets multiplied with W squared you have X zero zero times W squared and so on right and eventually you know this gets built up to compute this xw x zero zero times W squared plus x10 times W plus and so on and so eventually you get your the cost function right now against the last term to be added would be a x2 0 where it gets added to be the cost I won't write the other form under the cost and and the nice thing about center flow is that by implementing maybe forward propagation through this computation graph the computed cost tens of flow has already back built in all the necessary backward functions so remember how training a thief new network has a set of forward functions instead of backward functions and programming frameworks Blake tensorflow have already built in the necessary backward functions which is why by using the built-in functions to compute the forward function it can automatically do the backward functions as well to implement back propagation through even very complicated functions and compute derivatives for you so that's why you don't need to explicitly implement back prop and this is one of the things that make the proving framework help you become really efficient if you look at the terms of so documentation I just don't point out that the tentacled documentation uses a slightly different notation then I did for drawing the computation graph that uses X 0 0 W and then rather than writing the value like W squared the tension-filled documentation tends to just write the operations so this would be a square operation and these two get combined in a multiplication operation and so on and then the final note I get there'd be a addition operation when you add X to 0 to find a final value so for the purposes of this clause I thought that this notation for the compensation drop would be easier for you to understand but if you look at the tensorflow documentation as we look at the computation graphs in the documentation you see this alternative convention where the nodes are labeled with the operations rather than with the value but both of these representations you represent basically the same computation graph and a lot of things they can do with just one line of code in programming frameworks for example if you don't want to use gradient descent but instead you want to use the atom optimizer by changing this line of code you can very quickly swap it swap in a better optimization algorithm so all the modern deep learning program frameworks support things like this and makes it really easy for you to code up even pretty complex neural networks so I hope this was helpful for giving you a sense of the typical structure of a tensor field program to recap the material from this week you saw how to systematically organize the hyper parameter search process we also talked about batch normalization and how you can use that to speed up training of your networks and finally we talked about programming frameworks so deep learning there are many great program frameworks and we had this last video focusing on tens of so with that I hope you enjoyed this week's pro exercise and that helps you gain even more familiarity with these ideas

Original Description

Take the Deep Learning Specialization: http://bit.ly/38u7YIW Check out all our courses: https://www.deeplearning.ai Subscribe to The Batch, our weekly newsletter: https://www.deeplearning.ai/thebatch Follow us: Twitter: https://twitter.com/deeplearningai_ Facebook: https://www.facebook.com/deeplearningHQ/ Linkedin: https://www.linkedin.com/company/deeplearningai

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DeepLearningAI · DeepLearningAI · 34 of 60

← Previous Next →

Forward and Backward Propagation (C1W4L06)

Forward and Backward Propagation (C1W4L06)

deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin

deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin

deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov

deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov

deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio

deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio

deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel

deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel

deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow

deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow

deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy

deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy

Using an Appropriate Scale (C2W3L02)

Using an Appropriate Scale (C2W3L02)

Gradient Checking (C2W1L13)

Gradient Checking (C2W1L13)

Gradient Checking Implementation Notes (C2W1L14)

Gradient Checking Implementation Notes (C2W1L14)

Learning Rate Decay (C2W2L09)

Learning Rate Decay (C2W2L09)

Understanding Mini-Batch Gradient Dexcent (C2W2L02)

Understanding Mini-Batch Gradient Dexcent (C2W2L02)

Mini Batch Gradient Descent (C2W2L01)

Mini Batch Gradient Descent (C2W2L01)

The Problem of Local Optima (C2W3L10)

The Problem of Local Optima (C2W3L10)

Exponentially Weighted Averages (C2W2L03)

Exponentially Weighted Averages (C2W2L03)

Tuning Process (C2W3L01)

Tuning Process (C2W3L01)

Understanding Exponentially Weighted Averages (C2W2L04)

Understanding Exponentially Weighted Averages (C2W2L04)

Bias Correction of Exponentially Weighted Averages (C2W2L05)

Bias Correction of Exponentially Weighted Averages (C2W2L05)

Gradient Descent With Momentum (C2W2L06)

Gradient Descent With Momentum (C2W2L06)

Normalizing Activations in a Network (C2W3L04)

Normalizing Activations in a Network (C2W3L04)

Hyperparameter Tuning in Practice (C2W3L03)

Hyperparameter Tuning in Practice (C2W3L03)

Adam Optimization Algorithm (C2W2L08)

Adam Optimization Algorithm (C2W2L08)

RMSProp (C2W2L07)

RMSProp (C2W2L07)

Fitting Batch Norm Into Neural Networks (C2W3L05)

Fitting Batch Norm Into Neural Networks (C2W3L05)

Why Does Batch Norm Work? (C2W3L06)

Why Does Batch Norm Work? (C2W3L06)

Batch Norm At Test Time (C2W3L07)

Batch Norm At Test Time (C2W3L07)

Softmax Regression (C2W3L08)

Softmax Regression (C2W3L08)

Deep Learning Frameworks (C2W3L10)

Deep Learning Frameworks (C2W3L10)

Neural Network Overview (C1W3L01)

Neural Network Overview (C1W3L01)

Training Softmax Classifier (C2W3L09)

Training Softmax Classifier (C2W3L09)

Why Deep Representations? (C1W4L04)

Why Deep Representations? (C1W4L04)

Gradient Descent For Neural Networks (C1W3L09)

Gradient Descent For Neural Networks (C1W3L09)

Neural Network Representations (C1W3L02)

Neural Network Representations (C1W3L02)

TensorFlow (C2W3L11)

TensorFlow (C2W3L11)

Activation Functions (C1W3L06)

Activation Functions (C1W3L06)

Explanation For Vectorized Implementation (C1W3L05)

Explanation For Vectorized Implementation (C1W3L05)

Getting Matrix Dimensions Right (C1W4L03)

Getting Matrix Dimensions Right (C1W4L03)

Understanding Dropout (C2W1L07)

Understanding Dropout (C2W1L07)

Building Blocks of a Deep Neural Network (C1W4L05)

Building Blocks of a Deep Neural Network (C1W4L05)

Why Non-linear Activation Functions (C1W3L07)

Why Non-linear Activation Functions (C1W3L07)

Computing Neural Network Output (C1W3L03)

Computing Neural Network Output (C1W3L03)

Backpropagation Intuition (C1W3L10)

Backpropagation Intuition (C1W3L10)

Train/Dev/Test Sets (C2W1L01)

Train/Dev/Test Sets (C2W1L01)

Deep L-Layer Neural Network (C1W4L01)

Deep L-Layer Neural Network (C1W4L01)

Random Initialization (C1W3L11)

Random Initialization (C1W3L11)

Other Regularization Methods (C2W1L08)

Other Regularization Methods (C2W1L08)

Normalizing Inputs (C2W1L09)

Normalizing Inputs (C2W1L09)

Derivatives Of Activation Functions (C1W3L08)

Derivatives Of Activation Functions (C1W3L08)

Parameters vs Hyperparameters (C1W4L07)

Parameters vs Hyperparameters (C1W4L07)

Vectorizing Across Multiple Examples (C1W3L04)

Vectorizing Across Multiple Examples (C1W3L04)

What does this have to do with the brain? (C1W4L08)

What does this have to do with the brain? (C1W4L08)

Dropout Regularization (C2W1L06)

Dropout Regularization (C2W1L06)

Vanishing/Exploding Gradients (C2W1L10)

Vanishing/Exploding Gradients (C2W1L10)

Basic Recipe for Machine Learning (C2W1L03)

Basic Recipe for Machine Learning (C2W1L03)

Bias/Variance (C2W1L02)

Bias/Variance (C2W1L02)

Forward Propagation in a Deep Network (C1W4L02)

Forward Propagation in a Deep Network (C1W4L02)

Weight Initialization in a Deep Network (C2W1L11)

Weight Initialization in a Deep Network (C2W1L11)

Numerical Approximations of Gradients (C2W1L12)

Numerical Approximations of Gradients (C2W1L12)

Regularization (C2W1L04)

Regularization (C2W1L04)

Why Regularization Reduces Overfitting (C2W1L05)

Why Regularization Reduces Overfitting (C2W1L05)

This video teaches the basics of TensorFlow and how to use it for gradient descent optimization and cost function minimization. It covers topics such as TensorFlow program structure, automatic differentiation, and computation graphs, and provides practical examples of how to implement these concepts.

Key Takeaways

Define a parameter as a TensorFlow variable
Define a cost function using TF.add and TF.multiply
Initialize the parameter to zero
Define a train function using TensorFlow's train.GradientDescentOptimizer
Run one step of gradient descent and evaluate the parameter
Use placeholders to input training data
Run gradient descent to minimize the cost function

💡 TensorFlow can automatically compute derivatives and apply gradient optimizers with minimal code, making it a powerful tool for machine learning tasks.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related AI Lessons

Want to get started with deep learning

Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch

Reddit r/deeplearning

Building a Deepfake Detector From Scratch — What Nobody Tells You

Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media

Medium · Deep Learning

Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…

Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance

Medium · Deep Learning

Implementing Neural Style Transfer from Scratch: The Project That Started It All

Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning

Medium · Deep Learning

Image Classification with ml5.js

The Coding Train