Computing Neural Network Output (C1W3L03)

DeepLearningAI · Beginner ·📐 ML Fundamentals ·8y ago

Key Takeaways

This video demonstrates the computation of neural network output using a two-layer neural network, covering topics such as vectorization, matrix multiplication, and logistic regression. The video uses specific tools and frameworks, including sigmoid activation functions and vectorized implementation of neural network computations.

Full Transcript

in the last video you saw what a single hidden layer neural network looks like in this video let's go through the details of exactly how this neural network computers outputs what you see is that is like logistic regression but repeater of all the times let's take a look so this is what's a two layer neural network looks let's go more DB into exactly what this new network compute now was set before that logistic regression the circle in logistic regression really represents two steps of computation rows you compute Z as follows and in second you compute the activation as a sigmoid function of Z so in your network just does this a lot more times let's start by focusing on just one of the nodes in the hidden layer and let's look at the first node in the hidden layer so I've grayed out the other nodes for now so similar to logistic regression on the left is node in a hidden layer that's two steps of computation right the first step and think of as the left half of this node it computes Z equals W transpose X plus B and the notation were used is um these are all quantities associated with the first hidden there so that's why we have a bunch of square brackets there and this is the first node in the hidden layer so that's why we have the subscript one over there so first it does that and then the second step is it computes a 1 1 equals say point of Z 1 1 like so so for both Zn a the notational convention is that on a oh I DL here in superscript square backers refers to layer number and the I subscript here refers to the nodes in that layer so the node will be looking at is layer 1 that is a hidden layer node 1 so that's why the superscript and subscript were on both 1 1 so that little circle that first node in a neural network represents carrying out these two steps of computation now let's look at the second node in your network the second node in a hidden layer comes in your network similar to the logistic regression unit on the left this little circle represents two steps of computation the first step is a confusing Z this is still layer 1 the now is the second note equals W Tron's x+ v 2 and then a & 2 equals sigmoid of z12 and again feel free to pause the video if you want that you can double check that B superscript and subscript notation is consistent with what we have written here above in purple so we'll talk through the first two hidden units in the neural network on hidden units 3 & 4 also represents some computations so now let me take this pair of equations and this pair of equations and let's copy them to the next line so here's our network and here's the first and there's the second equations they've worked on previously for the first and the second hidden units if you then go through and write out the corresponding equations for the third and fourth hidden units you get the following and those make sure this notation is clear this is the vector W 1 1 this is a vector transpose times X so that's what the superscript T there represents this vector transpose now as you might have guessed if you're actually implementing in your network doing this with a for loop seems really inefficient so what we're going to do is take these four equations and vectorize so I'm going to start by showing how to compute Z as a vector and it turns out you could do it as follows let me take these WS and stack them into a matrix then you have W 1 1 transpose so that's a row vector of the column vector transpose gives you a row vector and W 1 2 transpose W 1 3 transpose of V 1 4 transpose and so this by stacking goes from for W vectors together you end up with a matrix so another way to think of this is that we have for logistic regression unions there and each of the logistic regression you know is has a corresponding parameter vector W and by stacking those four vectors together you end up with this 4 by 3 matrix so if you then take this matrix and multiply it by your input features x1 x2 x3 you end up with by our matrix multiplication works you end up with w1 1 transpose X W 1 this will be 2 1 transpose X we 1 transpose X wo 1 transpose X and then now let's not forget the bees so we now add to this the vector e1 1 b12 b13 in 1/4 so that's basically this then this gives b11 b12 b13 b14 and so you see that each of the 4 rows of this outcome correspond exactly to each of these 4 rows each of these four quantities that we had above so in other words we've just shown that this thing is therefore equal to V 1 1 V 1 to V 1 V V 1 core right as defined here and maybe not surprisingly we're going to call this whole thing the vector V 1 which is taken by stacking up these are individuals of these into a column vector when we're vectorizing one of the rules of thumb that might help you navigate this is that when we have different nodes in a layer we'll stack them vertically so that's why when you have V 1 1 2 0 1 4 those correspond to four different nodes in the hidden layer and so we stack these four numbers vertically to form the vector Z 1 and reduce one more piece of notation this 4 by 3 matrix here which we obtained by stacking the lower case you know W 1 1 W 1 2 and so on we're going to call this matrix W Capital One and similarly this vector or going to call B superscript 1 square bracket and so this is a four point one vector so now we've computed Z using this vector matrix notation the last thing we need to do is also compute these values of a and so probably won't surprise you to see that we're going to define a 1 as just stacking together those activation values a11 to a14 so just take these four values and stack them together in a vector called a1 and this is going to be sigmoid of z1 where there's no husband implantation of the sigmoid function that takes in the four elements of Z and applies the sigmoid function element wise to it so just a we figured out that Z 1 is equal to w1 times the vector X plus the vector B 1 and a 1 is sigmoid times Z 1 let's just copy this to the next slide and what we see is that for the first layer of the neural network given an input X we have that Z 1 is equal to W 1 times X plus B 1 and a 1 is Sigma we took Z 1 and the dimensions of this are 4 by 1 equals this is a 4 by 3 matrix times a 3 by 1 vector plus a on 4 by 1 vector B and this is 4 by 1 same dimensions and remember that we said X is equal to a 0 right just like Y hat is also equal to a 2 so if you want you can actually take this X and replace it with a 0 since a 0 is if you want it as an alias for the vector of input features X now through a similar derivation you can figure out that the representation for the next layer can also be written similarly where what the output layer does is it has associated with it so the parameters W 2 and B 2 so W 2 in this case is going to be a 1 by 4 matrix and B 2 is just a real number as 1 by 1 and so V 2 is going to be a real numbers right as a 1 by 1 matrix is going to be a 1 by 4 thing times a was 4 by 1 plus B 2 is 1 by 1 and so this gives you just a real number and if you think of this loss output unit as just being analogous to logistic regression which had parameters W and B W really plays in lagless real to W 2 transpose or W 2's really W transpose and B is equal to V 2 right said were to you know cover up the left of this network and ignore all that for now then this is just this last output unit is a lot like logistic regression except that instead of writing the parameters as WMV we're writing them as W 2 and V 2 with dimensions one by four and one by one so just a recap for logistic regression to implement the output or the influence prediction you compute Z equals W transpose X plus B and a y hat equals a equals sigmoid of z when you have a new network with one hidden layer what you need to implement two computers output is just these four equations and you can think of this as a vectorized implementation of computing the output of first these for logistic regression units in the hidden layer that's what this does and then this which is regression in the output layer which is what this does I hope this description made sense but takeaway is to compute the output of this neural network all you need is those four lines of code so now you've seen how given a single input feature vector at you can with four lines of code compute the outputs of this new network um similar to what we did for the gist regression will also want to vectorize across multiple training examples and we'll see that by stacking up training examples in different columns in the matrix or just slight modification to this you also similar to what you saw in which is regression be able to compute the output of this neural network not just on one example at a time belong your say your entire inning set at a time so let's see the details of that in the next video

Original Description

Take the Deep Learning Specialization: http://bit.ly/38jCe9e Check out all our courses: https://www.deeplearning.ai Subscribe to The Batch, our weekly newsletter: https://www.deeplearning.ai/thebatch Follow us: Twitter: https://twitter.com/deeplearningai_ Facebook: https://www.facebook.com/deeplearningHQ/ Linkedin: https://www.linkedin.com/company/deeplearningai
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DeepLearningAI · DeepLearningAI · 41 of 60

1 Forward and Backward Propagation (C1W4L06)
Forward and Backward Propagation (C1W4L06)
DeepLearningAI
2 deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin
deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin
DeepLearningAI
3 deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov
deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov
DeepLearningAI
4 deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio
deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio
DeepLearningAI
5 deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel
deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel
DeepLearningAI
6 deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow
deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow
DeepLearningAI
7 deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy
deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy
DeepLearningAI
8 Using an Appropriate Scale (C2W3L02)
Using an Appropriate Scale (C2W3L02)
DeepLearningAI
9 Gradient Checking (C2W1L13)
Gradient Checking (C2W1L13)
DeepLearningAI
10 Gradient Checking Implementation Notes (C2W1L14)
Gradient Checking Implementation Notes (C2W1L14)
DeepLearningAI
11 Learning Rate Decay (C2W2L09)
Learning Rate Decay (C2W2L09)
DeepLearningAI
12 Understanding Mini-Batch Gradient Dexcent (C2W2L02)
Understanding Mini-Batch Gradient Dexcent (C2W2L02)
DeepLearningAI
13 Mini Batch Gradient Descent (C2W2L01)
Mini Batch Gradient Descent (C2W2L01)
DeepLearningAI
14 The Problem of Local Optima (C2W3L10)
The Problem of Local Optima (C2W3L10)
DeepLearningAI
15 Exponentially Weighted Averages (C2W2L03)
Exponentially Weighted Averages (C2W2L03)
DeepLearningAI
16 Tuning Process (C2W3L01)
Tuning Process (C2W3L01)
DeepLearningAI
17 Understanding Exponentially Weighted Averages (C2W2L04)
Understanding Exponentially Weighted Averages (C2W2L04)
DeepLearningAI
18 Bias Correction of Exponentially Weighted Averages (C2W2L05)
Bias Correction of Exponentially Weighted Averages (C2W2L05)
DeepLearningAI
19 Gradient Descent With Momentum (C2W2L06)
Gradient Descent With Momentum (C2W2L06)
DeepLearningAI
20 Normalizing Activations in a Network (C2W3L04)
Normalizing Activations in a Network (C2W3L04)
DeepLearningAI
21 Hyperparameter Tuning in Practice (C2W3L03)
Hyperparameter Tuning in Practice (C2W3L03)
DeepLearningAI
22 Adam Optimization Algorithm (C2W2L08)
Adam Optimization Algorithm (C2W2L08)
DeepLearningAI
23 RMSProp (C2W2L07)
RMSProp (C2W2L07)
DeepLearningAI
24 Fitting Batch Norm Into Neural Networks (C2W3L05)
Fitting Batch Norm Into Neural Networks (C2W3L05)
DeepLearningAI
25 Why Does Batch Norm Work? (C2W3L06)
Why Does Batch Norm Work? (C2W3L06)
DeepLearningAI
26 Batch Norm At Test Time (C2W3L07)
Batch Norm At Test Time (C2W3L07)
DeepLearningAI
27 Softmax Regression (C2W3L08)
Softmax Regression (C2W3L08)
DeepLearningAI
28 Deep Learning Frameworks (C2W3L10)
Deep Learning Frameworks (C2W3L10)
DeepLearningAI
29 Neural Network Overview (C1W3L01)
Neural Network Overview (C1W3L01)
DeepLearningAI
30 Training Softmax Classifier (C2W3L09)
Training Softmax Classifier (C2W3L09)
DeepLearningAI
31 Why Deep Representations? (C1W4L04)
Why Deep Representations? (C1W4L04)
DeepLearningAI
32 Gradient Descent For Neural Networks (C1W3L09)
Gradient Descent For Neural Networks (C1W3L09)
DeepLearningAI
33 Neural Network Representations (C1W3L02)
Neural Network Representations (C1W3L02)
DeepLearningAI
34 TensorFlow (C2W3L11)
TensorFlow (C2W3L11)
DeepLearningAI
35 Activation Functions (C1W3L06)
Activation Functions (C1W3L06)
DeepLearningAI
36 Explanation For Vectorized Implementation (C1W3L05)
Explanation For Vectorized Implementation (C1W3L05)
DeepLearningAI
37 Getting Matrix Dimensions Right (C1W4L03)
Getting Matrix Dimensions Right (C1W4L03)
DeepLearningAI
38 Understanding Dropout (C2W1L07)
Understanding Dropout (C2W1L07)
DeepLearningAI
39 Building Blocks of a Deep Neural Network (C1W4L05)
Building Blocks of a Deep Neural Network (C1W4L05)
DeepLearningAI
40 Why Non-linear Activation Functions (C1W3L07)
Why Non-linear Activation Functions (C1W3L07)
DeepLearningAI
Computing Neural Network Output (C1W3L03)
Computing Neural Network Output (C1W3L03)
DeepLearningAI
42 Backpropagation Intuition (C1W3L10)
Backpropagation Intuition (C1W3L10)
DeepLearningAI
43 Train/Dev/Test Sets (C2W1L01)
Train/Dev/Test Sets (C2W1L01)
DeepLearningAI
44 Deep L-Layer Neural Network (C1W4L01)
Deep L-Layer Neural Network (C1W4L01)
DeepLearningAI
45 Random Initialization (C1W3L11)
Random Initialization (C1W3L11)
DeepLearningAI
46 Other Regularization Methods (C2W1L08)
Other Regularization Methods (C2W1L08)
DeepLearningAI
47 Normalizing Inputs (C2W1L09)
Normalizing Inputs (C2W1L09)
DeepLearningAI
48 Derivatives Of Activation Functions (C1W3L08)
Derivatives Of Activation Functions (C1W3L08)
DeepLearningAI
49 Parameters vs Hyperparameters (C1W4L07)
Parameters vs Hyperparameters (C1W4L07)
DeepLearningAI
50 Vectorizing Across Multiple Examples (C1W3L04)
Vectorizing Across Multiple Examples (C1W3L04)
DeepLearningAI
51 What does this have to do with the brain? (C1W4L08)
What does this have to do with the brain? (C1W4L08)
DeepLearningAI
52 Dropout Regularization (C2W1L06)
Dropout Regularization (C2W1L06)
DeepLearningAI
53 Vanishing/Exploding Gradients (C2W1L10)
Vanishing/Exploding Gradients (C2W1L10)
DeepLearningAI
54 Basic Recipe for Machine Learning (C2W1L03)
Basic Recipe for Machine Learning (C2W1L03)
DeepLearningAI
55 Bias/Variance (C2W1L02)
Bias/Variance (C2W1L02)
DeepLearningAI
56 Forward Propagation in a Deep Network (C1W4L02)
Forward Propagation in a Deep Network (C1W4L02)
DeepLearningAI
57 Weight Initialization in a Deep Network (C2W1L11)
Weight Initialization in a Deep Network (C2W1L11)
DeepLearningAI
58 Numerical Approximations of Gradients (C2W1L12)
Numerical Approximations of Gradients (C2W1L12)
DeepLearningAI
59 Regularization (C2W1L04)
Regularization (C2W1L04)
DeepLearningAI
60 Why Regularization Reduces Overfitting (C2W1L05)
Why Regularization Reduces Overfitting (C2W1L05)
DeepLearningAI

This video teaches how to compute the output of a two-layer neural network using vectorization and matrix multiplication. It covers the key concepts of neural networks, logistic regression, and vectorized implementation, and provides a step-by-step guide to computing neural network output. By watching this video, viewers will learn how to implement neural network computations using vectorized implementation and matrix multiplication.

Key Takeaways
  1. Compute Z as W transpose X plus B
  2. Compute the activation as a sigmoid function of Z
  3. Stack the parameter vectors W1 and W2 into a matrix W
  4. Multiply the matrix W by the input features X
  5. Add bias vector to the result
  6. Vectorize individual nodes in a layer
  7. Stack bias vectors to form vector B
  8. Compute y_hat = sigmoid(Z)
💡 Vectorizing individual nodes in a layer and using matrix multiplication can significantly speed up neural network computations.

Related Reads

📰
One-Hot Encoding — Turning Words Into Switches
Learn one-hot encoding to turn words into numerical vectors for AI models, a fundamental technique in natural language processing.
Medium · Data Science
📰
Chunking Done Right: Normalization, sentence boundaries, and overlap
Master chunking techniques to improve retrieval pipeline performance and avoid common pitfalls
Medium · Programming
📰
Why Materials Scientists Are Still Copy-Pasting Data from PDFs in 2026 (And Why AI Changes…
Materials scientists still copy-paste data from PDFs, but AI can change this tedious task
Medium · Machine Learning
📰
From Python Slop to 4µs Rust: How We Accelerated Market Microstructure Simulations by 25,000x
Accelerate market microstructure simulations by 25,000x by migrating from Python to Rust, learning how to optimize performance-critical code
Medium · Data Science
Up next
Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub
FAME WORLD EDUCATIONAL HUB
Watch →