Vectorizing Across Multiple Examples (C1W3L04)
Key Takeaways
The video demonstrates how to vectorize across multiple training examples in a neural network, using equations from the previous video and modifying them to compute outputs for all examples at once. The process involves stacking training examples in columns of a matrix and using vectorized implementations of the equations to compute the outputs.
Full Transcript
in the last video you saw how to compute the prediction on a new network given a single training example in this video you see how to vectorize across multiple training examples and the outcome will be quite similar to what you saw for logistic regression where by stacking up different training examples in different columns of the matrix you'll be able to take the equations you have from the previous video and with very little modification change them to make the neural network compute the outputs on all the examples on pretty much all at the same time so let's see the details of how to do that these were the four equations we have from the previous video of how you compute Z 1 a 1 Z 2 and a 2 and they tell you how given an input feature vector X you can use them to generate a 2 equals y hat for single training example now if you have M training examples you need to repeat this process for say the first training example X superscript round records one to compute Y hat one Wester prediction on your first training example then X to use that to generate prediction y hat two and so on down to XM to generate a prediction y hat M and so in order to write this the activation function notation as well I'm going to write this as a two square bracket round bracket 1 this is a 2 2 and a 2 m so this notation a square bracket 2 round bracket I the round bracket I refers to training example I and the square bracket 2 refers to layer 2 ok so that's how the square bracket and the round bracket industries work and so the suggest that if you have an unvectorized implementation and want to compute the prediction for all your training examples you need to do for I equals 1 to em on there basically intimate these four equations where you need I guess z1 I equals 31 X I plus B 1 on a 1 I equals sigmoid z 1i z 2 I equals W 2 a 1 I plus V 2 and a 2 I equals sigmoid of z2 I right so it's basically you know these four equations on top and adding the superscript round bracket I to all the variables that depend on the training example so adding those superstream round bracket I to X Z and a if you want to compute all the outputs on your M training examples what we like to do is vectorize this whole computation so it's to get rid of this volume and by the way in case it seems like I'm getting a little more of nitty-gritty linear algebra it turns out that being able to implement this correctly is important in the deep learning error and we actually chose the notation very carefully for this class to make these vectorization as as easy as possible so I hope that great through this nitty-gritty will actually help you to more quickly get your correct implementations of these advents working all right so let me just copy this whole block of code to the next slide and then we'll see how to vectorize this so here's we had from the previous line with a four group going over all M training examples so recall that we define the matrix X to be equal to our training examples stacked up on these columns like so so take the training examples stack them in columns so this becomes a n or maybe NX by dimensional matrix I'm just going to give away the punchline and tell you what you need to implement in order to vectorize implementation of this for loop turns out what you need to do is compute capital Z 1 equals W 1 X plus B 1 capital a1 equals sigmoid of z1 then tap code Z 2 equals W 2 times a 1 plus B 2 and then a2 equals sigmoid of Z 2 so if you want the analogy is that we went from lowercase vector X s to this capital case X matrix by stacking up the lower case X's in different columns if you do the same thing for the Z's so for example if you take Z 1 1 z 1 2 and so on these are all column vectors up to Z 1 m right so that's this first quantity but all M of them and stack them in columns then this gives you the matrix Z 1 and similarly if you look at say this quantity you take a 1 1 a 1 2 and so on in a 1m and stack them up in columns then this just as we went from lower case X to capital case X and lo que si to Catholic 8z this goes from the lower case a which are vectors to do some capital A 1 that's over there and similarly for Z 2 and a 2 right there also attained by taking these vectors and stacking them horizontally and taking these vectors and stacking them horizontally in order to get Z Capital Z 2 and capital e 2 one of the property of this notation that might help you to think about it is that these matrices say Z and a horizontally we're going to index across training examples so that's why the horizontal index you know corresponds to different training examples is sweep from left to right you're scanning through the training set and vertically this vertical index corresponds to different notes in the neural network so for example this note this value at the topmost topmost corner of the matrix corresponds to the activation of the first hidden unit on the first training example on one value down corresponds to the activation in the second hidden unit on the first training example then the third heading unit on the first training example and so on so as you scan down this is new indexing into the hidden units number where as you do with horizontally then you're going from the first hidden unit in the first training example to you now the first in the human second training example the third turn example and so on until this note here corresponds to the activation of the first hidden unit in the final training example in the M training example ok so the horizontal the the matrix a goes over a different training examples and vertically the different indices in the matrix a corresponds to different hidden units and a similar intuition holds true for the matrix Z as well as well as for X where horizontally it corresponds to different training examples and vertically it corresponds to different features different input features which are really different notes in phileo of the neural network so with these equations you now know how to implement a neural network with vectorization that is vectorization across multiple examples in the next video I want to show you a bit more justification about why this is a correct implementation of this type of vectorization it turns out the justification will be similar to whether you have seen for logistic regression let's go on to the next video
Original Description
Take the Deep Learning Specialization: http://bit.ly/2IfZoml
Check out all our courses: https://www.deeplearning.ai
Subscribe to The Batch, our weekly newsletter: https://www.deeplearning.ai/thebatch
Follow us:
Twitter: https://twitter.com/deeplearningai_
Facebook: https://www.facebook.com/deeplearningHQ/
Linkedin: https://www.linkedin.com/company/deeplearningai
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from DeepLearningAI · DeepLearningAI · 50 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
▶
51
52
53
54
55
56
57
58
59
60
Forward and Backward Propagation (C1W4L06)
DeepLearningAI
deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin
DeepLearningAI
deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov
DeepLearningAI
deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio
DeepLearningAI
deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel
DeepLearningAI
deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow
DeepLearningAI
deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy
DeepLearningAI
Using an Appropriate Scale (C2W3L02)
DeepLearningAI
Gradient Checking (C2W1L13)
DeepLearningAI
Gradient Checking Implementation Notes (C2W1L14)
DeepLearningAI
Learning Rate Decay (C2W2L09)
DeepLearningAI
Understanding Mini-Batch Gradient Dexcent (C2W2L02)
DeepLearningAI
Mini Batch Gradient Descent (C2W2L01)
DeepLearningAI
The Problem of Local Optima (C2W3L10)
DeepLearningAI
Exponentially Weighted Averages (C2W2L03)
DeepLearningAI
Tuning Process (C2W3L01)
DeepLearningAI
Understanding Exponentially Weighted Averages (C2W2L04)
DeepLearningAI
Bias Correction of Exponentially Weighted Averages (C2W2L05)
DeepLearningAI
Gradient Descent With Momentum (C2W2L06)
DeepLearningAI
Normalizing Activations in a Network (C2W3L04)
DeepLearningAI
Hyperparameter Tuning in Practice (C2W3L03)
DeepLearningAI
Adam Optimization Algorithm (C2W2L08)
DeepLearningAI
RMSProp (C2W2L07)
DeepLearningAI
Fitting Batch Norm Into Neural Networks (C2W3L05)
DeepLearningAI
Why Does Batch Norm Work? (C2W3L06)
DeepLearningAI
Batch Norm At Test Time (C2W3L07)
DeepLearningAI
Softmax Regression (C2W3L08)
DeepLearningAI
Deep Learning Frameworks (C2W3L10)
DeepLearningAI
Neural Network Overview (C1W3L01)
DeepLearningAI
Training Softmax Classifier (C2W3L09)
DeepLearningAI
Why Deep Representations? (C1W4L04)
DeepLearningAI
Gradient Descent For Neural Networks (C1W3L09)
DeepLearningAI
Neural Network Representations (C1W3L02)
DeepLearningAI
TensorFlow (C2W3L11)
DeepLearningAI
Activation Functions (C1W3L06)
DeepLearningAI
Explanation For Vectorized Implementation (C1W3L05)
DeepLearningAI
Getting Matrix Dimensions Right (C1W4L03)
DeepLearningAI
Understanding Dropout (C2W1L07)
DeepLearningAI
Building Blocks of a Deep Neural Network (C1W4L05)
DeepLearningAI
Why Non-linear Activation Functions (C1W3L07)
DeepLearningAI
Computing Neural Network Output (C1W3L03)
DeepLearningAI
Backpropagation Intuition (C1W3L10)
DeepLearningAI
Train/Dev/Test Sets (C2W1L01)
DeepLearningAI
Deep L-Layer Neural Network (C1W4L01)
DeepLearningAI
Random Initialization (C1W3L11)
DeepLearningAI
Other Regularization Methods (C2W1L08)
DeepLearningAI
Normalizing Inputs (C2W1L09)
DeepLearningAI
Derivatives Of Activation Functions (C1W3L08)
DeepLearningAI
Parameters vs Hyperparameters (C1W4L07)
DeepLearningAI
Vectorizing Across Multiple Examples (C1W3L04)
DeepLearningAI
What does this have to do with the brain? (C1W4L08)
DeepLearningAI
Dropout Regularization (C2W1L06)
DeepLearningAI
Vanishing/Exploding Gradients (C2W1L10)
DeepLearningAI
Basic Recipe for Machine Learning (C2W1L03)
DeepLearningAI
Bias/Variance (C2W1L02)
DeepLearningAI
Forward Propagation in a Deep Network (C1W4L02)
DeepLearningAI
Weight Initialization in a Deep Network (C2W1L11)
DeepLearningAI
Numerical Approximations of Gradients (C2W1L12)
DeepLearningAI
Regularization (C2W1L04)
DeepLearningAI
Why Regularization Reduces Overfitting (C2W1L05)
DeepLearningAI
More on: ML Maths Basics
View skill →Related Reads
📰
📰
📰
📰
Evolving Algorithms: Next-Generation AI in Predictive Analytics
Dev.to · Fu'ad Husnan
Architecting for the Future: A Blueprint for Model-Agnostic, Business-Ready AI
Medium · AI
The Recommender System Pipeline: An End-to-End Overview
Medium · AI
The Recommender System Pipeline: An End-to-End Overview
Medium · Machine Learning
🎓
Tutor Explanation
DeepCamp AI