Computing Neural Network Output (C1W3L03)

DeepLearningAI · Beginner ·📐 ML Fundamentals ·8y ago

Skills: ML Maths Basics90%Supervised Learning80%ML Pipelines70%

Key Takeaways

This video demonstrates the computation of neural network output using a two-layer neural network, covering topics such as vectorization, matrix multiplication, and logistic regression. The video uses specific tools and frameworks, including sigmoid activation functions and vectorized implementation of neural network computations.

Full Transcript

in the last video you saw what a single hidden layer neural network looks like in this video let's go through the details of exactly how this neural network computers outputs what you see is that is like logistic regression but repeater of all the times let's take a look so this is what's a two layer neural network looks let's go more DB into exactly what this new network compute now was set before that logistic regression the circle in logistic regression really represents two steps of computation rows you compute Z as follows and in second you compute the activation as a sigmoid function of Z so in your network just does this a lot more times let's start by focusing on just one of the nodes in the hidden layer and let's look at the first node in the hidden layer so I've grayed out the other nodes for now so similar to logistic regression on the left is node in a hidden layer that's two steps of computation right the first step and think of as the left half of this node it computes Z equals W transpose X plus B and the notation were used is um these are all quantities associated with the first hidden there so that's why we have a bunch of square brackets there and this is the first node in the hidden layer so that's why we have the subscript one over there so first it does that and then the second step is it computes a 1 1 equals say point of Z 1 1 like so so for both Zn a the notational convention is that on a oh I DL here in superscript square backers refers to layer number and the I subscript here refers to the nodes in that layer so the node will be looking at is layer 1 that is a hidden layer node 1 so that's why the superscript and subscript were on both 1 1 so that little circle that first node in a neural network represents carrying out these two steps of computation now let's look at the second node in your network the second node in a hidden layer comes in your network similar to the logistic regression unit on the left this little circle represents two steps of computation the first step is a confusing Z this is still layer 1 the now is the second note equals W Tron's x+ v 2 and then a & 2 equals sigmoid of z12 and again feel free to pause the video if you want that you can double check that B superscript and subscript notation is consistent with what we have written here above in purple so we'll talk through the first two hidden units in the neural network on hidden units 3 & 4 also represents some computations so now let me take this pair of equations and this pair of equations and let's copy them to the next line so here's our network and here's the first and there's the second equations they've worked on previously for the first and the second hidden units if you then go through and write out the corresponding equations for the third and fourth hidden units you get the following and those make sure this notation is clear this is the vector W 1 1 this is a vector transpose times X so that's what the superscript T there represents this vector transpose now as you might have guessed if you're actually implementing in your network doing this with a for loop seems really inefficient so what we're going to do is take these four equations and vectorize so I'm going to start by showing how to compute Z as a vector and it turns out you could do it as follows let me take these WS and stack them into a matrix then you have W 1 1 transpose so that's a row vector of the column vector transpose gives you a row vector and W 1 2 transpose W 1 3 transpose of V 1 4 transpose and so this by stacking goes from for W vectors together you end up with a matrix so another way to think of this is that we have for logistic regression unions there and each of the logistic regression you know is has a corresponding parameter vector W and by stacking those four vectors together you end up with this 4 by 3 matrix so if you then take this matrix and multiply it by your input features x1 x2 x3 you end up with by our matrix multiplication works you end up with w1 1 transpose X W 1 this will be 2 1 transpose X we 1 transpose X wo 1 transpose X and then now let's not forget the bees so we now add to this the vector e1 1 b12 b13 in 1/4 so that's basically this then this gives b11 b12 b13 b14 and so you see that each of the 4 rows of this outcome correspond exactly to each of these 4 rows each of these four quantities that we had above so in other words we've just shown that this thing is therefore equal to V 1 1 V 1 to V 1 V V 1 core right as defined here and maybe not surprisingly we're going to call this whole thing the vector V 1 which is taken by stacking up these are individuals of these into a column vector when we're vectorizing one of the rules of thumb that might help you navigate this is that when we have different nodes in a layer we'll stack them vertically so that's why when you have V 1 1 2 0 1 4 those correspond to four different nodes in the hidden layer and so we stack these four numbers vertically to form the vector Z 1 and reduce one more piece of notation this 4 by 3 matrix here which we obtained by stacking the lower case you know W 1 1 W 1 2 and so on we're going to call this matrix W Capital One and similarly this vector or going to call B superscript 1 square bracket and so this is a four point one vector so now we've computed Z using this vector matrix notation the last thing we need to do is also compute these values of a and so probably won't surprise you to see that we're going to define a 1 as just stacking together those activation values a11 to a14 so just take these four values and stack them together in a vector called a1 and this is going to be sigmoid of z1 where there's no husband implantation of the sigmoid function that takes in the four elements of Z and applies the sigmoid function element wise to it so just a we figured out that Z 1 is equal to w1 times the vector X plus the vector B 1 and a 1 is sigmoid times Z 1 let's just copy this to the next slide and what we see is that for the first layer of the neural network given an input X we have that Z 1 is equal to W 1 times X plus B 1 and a 1 is Sigma we took Z 1 and the dimensions of this are 4 by 1 equals this is a 4 by 3 matrix times a 3 by 1 vector plus a on 4 by 1 vector B and this is 4 by 1 same dimensions and remember that we said X is equal to a 0 right just like Y hat is also equal to a 2 so if you want you can actually take this X and replace it with a 0 since a 0 is if you want it as an alias for the vector of input features X now through a similar derivation you can figure out that the representation for the next layer can also be written similarly where what the output layer does is it has associated with it so the parameters W 2 and B 2 so W 2 in this case is going to be a 1 by 4 matrix and B 2 is just a real number as 1 by 1 and so V 2 is going to be a real numbers right as a 1 by 1 matrix is going to be a 1 by 4 thing times a was 4 by 1 plus B 2 is 1 by 1 and so this gives you just a real number and if you think of this loss output unit as just being analogous to logistic regression which had parameters W and B W really plays in lagless real to W 2 transpose or W 2's really W transpose and B is equal to V 2 right said were to you know cover up the left of this network and ignore all that for now then this is just this last output unit is a lot like logistic regression except that instead of writing the parameters as WMV we're writing them as W 2 and V 2 with dimensions one by four and one by one so just a recap for logistic regression to implement the output or the influence prediction you compute Z equals W transpose X plus B and a y hat equals a equals sigmoid of z when you have a new network with one hidden layer what you need to implement two computers output is just these four equations and you can think of this as a vectorized implementation of computing the output of first these for logistic regression units in the hidden layer that's what this does and then this which is regression in the output layer which is what this does I hope this description made sense but takeaway is to compute the output of this neural network all you need is those four lines of code so now you've seen how given a single input feature vector at you can with four lines of code compute the outputs of this new network um similar to what we did for the gist regression will also want to vectorize across multiple training examples and we'll see that by stacking up training examples in different columns in the matrix or just slight modification to this you also similar to what you saw in which is regression be able to compute the output of this neural network not just on one example at a time belong your say your entire inning set at a time so let's see the details of that in the next video

Original Description

Take the Deep Learning Specialization: http://bit.ly/38jCe9e Check out all our courses: https://www.deeplearning.ai Subscribe to The Batch, our weekly newsletter: https://www.deeplearning.ai/thebatch Follow us: Twitter: https://twitter.com/deeplearningai_ Facebook: https://www.facebook.com/deeplearningHQ/ Linkedin: https://www.linkedin.com/company/deeplearningai

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DeepLearningAI · DeepLearningAI · 41 of 60

← Previous Next →

Forward and Backward Propagation (C1W4L06)

Forward and Backward Propagation (C1W4L06)

deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin

deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin

deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov

deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov

deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio

deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio

deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel

deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel

deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow

deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow

deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy

deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy

Using an Appropriate Scale (C2W3L02)

Using an Appropriate Scale (C2W3L02)

Gradient Checking (C2W1L13)

Gradient Checking (C2W1L13)

Gradient Checking Implementation Notes (C2W1L14)

Gradient Checking Implementation Notes (C2W1L14)

Learning Rate Decay (C2W2L09)

Learning Rate Decay (C2W2L09)

Understanding Mini-Batch Gradient Dexcent (C2W2L02)

Understanding Mini-Batch Gradient Dexcent (C2W2L02)

Mini Batch Gradient Descent (C2W2L01)

Mini Batch Gradient Descent (C2W2L01)

The Problem of Local Optima (C2W3L10)

The Problem of Local Optima (C2W3L10)

Exponentially Weighted Averages (C2W2L03)

Exponentially Weighted Averages (C2W2L03)

Tuning Process (C2W3L01)

Tuning Process (C2W3L01)

Understanding Exponentially Weighted Averages (C2W2L04)

Understanding Exponentially Weighted Averages (C2W2L04)

Bias Correction of Exponentially Weighted Averages (C2W2L05)

Bias Correction of Exponentially Weighted Averages (C2W2L05)

Gradient Descent With Momentum (C2W2L06)

Gradient Descent With Momentum (C2W2L06)

Normalizing Activations in a Network (C2W3L04)

Normalizing Activations in a Network (C2W3L04)

Hyperparameter Tuning in Practice (C2W3L03)

Hyperparameter Tuning in Practice (C2W3L03)

Adam Optimization Algorithm (C2W2L08)

Adam Optimization Algorithm (C2W2L08)

RMSProp (C2W2L07)

RMSProp (C2W2L07)

Fitting Batch Norm Into Neural Networks (C2W3L05)

Fitting Batch Norm Into Neural Networks (C2W3L05)

Why Does Batch Norm Work? (C2W3L06)

Why Does Batch Norm Work? (C2W3L06)

Batch Norm At Test Time (C2W3L07)

Batch Norm At Test Time (C2W3L07)

Softmax Regression (C2W3L08)

Softmax Regression (C2W3L08)

Deep Learning Frameworks (C2W3L10)

Deep Learning Frameworks (C2W3L10)

Neural Network Overview (C1W3L01)

Neural Network Overview (C1W3L01)

Training Softmax Classifier (C2W3L09)

Training Softmax Classifier (C2W3L09)

Why Deep Representations? (C1W4L04)

Why Deep Representations? (C1W4L04)

Gradient Descent For Neural Networks (C1W3L09)

Gradient Descent For Neural Networks (C1W3L09)

Neural Network Representations (C1W3L02)

Neural Network Representations (C1W3L02)

TensorFlow (C2W3L11)

TensorFlow (C2W3L11)

Activation Functions (C1W3L06)

Activation Functions (C1W3L06)

Explanation For Vectorized Implementation (C1W3L05)

Explanation For Vectorized Implementation (C1W3L05)

Getting Matrix Dimensions Right (C1W4L03)

Getting Matrix Dimensions Right (C1W4L03)

Understanding Dropout (C2W1L07)

Understanding Dropout (C2W1L07)

Building Blocks of a Deep Neural Network (C1W4L05)

Building Blocks of a Deep Neural Network (C1W4L05)

Why Non-linear Activation Functions (C1W3L07)

Why Non-linear Activation Functions (C1W3L07)

Computing Neural Network Output (C1W3L03)

Computing Neural Network Output (C1W3L03)

Backpropagation Intuition (C1W3L10)

Backpropagation Intuition (C1W3L10)

Train/Dev/Test Sets (C2W1L01)

Train/Dev/Test Sets (C2W1L01)

Deep L-Layer Neural Network (C1W4L01)

Deep L-Layer Neural Network (C1W4L01)

Random Initialization (C1W3L11)

Random Initialization (C1W3L11)

Other Regularization Methods (C2W1L08)

Other Regularization Methods (C2W1L08)

Normalizing Inputs (C2W1L09)

Normalizing Inputs (C2W1L09)

Derivatives Of Activation Functions (C1W3L08)

Derivatives Of Activation Functions (C1W3L08)

Parameters vs Hyperparameters (C1W4L07)

Parameters vs Hyperparameters (C1W4L07)

Vectorizing Across Multiple Examples (C1W3L04)

Vectorizing Across Multiple Examples (C1W3L04)

What does this have to do with the brain? (C1W4L08)

What does this have to do with the brain? (C1W4L08)

Dropout Regularization (C2W1L06)

Dropout Regularization (C2W1L06)

Vanishing/Exploding Gradients (C2W1L10)

Vanishing/Exploding Gradients (C2W1L10)

Basic Recipe for Machine Learning (C2W1L03)

Basic Recipe for Machine Learning (C2W1L03)

Bias/Variance (C2W1L02)

Bias/Variance (C2W1L02)

Forward Propagation in a Deep Network (C1W4L02)

Forward Propagation in a Deep Network (C1W4L02)

Weight Initialization in a Deep Network (C2W1L11)

Weight Initialization in a Deep Network (C2W1L11)

Numerical Approximations of Gradients (C2W1L12)

Numerical Approximations of Gradients (C2W1L12)

Regularization (C2W1L04)

Regularization (C2W1L04)

Why Regularization Reduces Overfitting (C2W1L05)

Why Regularization Reduces Overfitting (C2W1L05)

This video teaches how to compute the output of a two-layer neural network using vectorization and matrix multiplication. It covers the key concepts of neural networks, logistic regression, and vectorized implementation, and provides a step-by-step guide to computing neural network output. By watching this video, viewers will learn how to implement neural network computations using vectorized implementation and matrix multiplication.

Key Takeaways

Compute Z as W transpose X plus B
Compute the activation as a sigmoid function of Z
Stack the parameter vectors W1 and W2 into a matrix W
Multiply the matrix W by the input features X
Add bias vector to the result
Vectorize individual nodes in a layer
Stack bias vectors to form vector B
Compute y_hat = sigmoid(Z)

💡 Vectorizing individual nodes in a layer and using matrix multiplication can significantly speed up neural network computations.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related Reads

I’m an ML Engineer. I got tired of "AI Trading Bot" scams, so I coded my own Cash Swing Trading Engine in public. (No advice, just math)

Learn how an ML engineer built a cash swing trading engine using math, without giving advice, to counter AI trading bot scams

Day 28 Part 1: No New Features Again — This Time We Make Everything Faster

Learn to optimize performance in a machine learning stack by profiling and addressing bottlenecks in FastAPI, Redis, Postgres, and ML inference

Medium · Machine Learning

Overfitting & Underfitting — When AI Learns Too Much or Too Little

Learn to identify and address overfitting and underfitting in AI models, crucial for improving model performance and generalization.

Evolving Algorithms: Next-Generation AI in Predictive Analytics

Learn how next-generation AI is transforming predictive analytics with evolving algorithms and why it matters for informed decision-making

Dev.to · Fu'ad Husnan

1. Overview of Artificial Intelligence | What is AI? Fundamental Concepts & Complete History of AI

Professor Rahul Jain