Vectorizing Across Multiple Examples (C1W3L04)

DeepLearningAI · Beginner ·📐 ML Fundamentals ·8y ago

Skills: ML Maths Basics80%ML Pipelines70%

Key Takeaways

The video demonstrates how to vectorize across multiple training examples in a neural network, using equations from the previous video and modifying them to compute outputs for all examples at once. The process involves stacking training examples in columns of a matrix and using vectorized implementations of the equations to compute the outputs.

Full Transcript

in the last video you saw how to compute the prediction on a new network given a single training example in this video you see how to vectorize across multiple training examples and the outcome will be quite similar to what you saw for logistic regression where by stacking up different training examples in different columns of the matrix you'll be able to take the equations you have from the previous video and with very little modification change them to make the neural network compute the outputs on all the examples on pretty much all at the same time so let's see the details of how to do that these were the four equations we have from the previous video of how you compute Z 1 a 1 Z 2 and a 2 and they tell you how given an input feature vector X you can use them to generate a 2 equals y hat for single training example now if you have M training examples you need to repeat this process for say the first training example X superscript round records one to compute Y hat one Wester prediction on your first training example then X to use that to generate prediction y hat two and so on down to XM to generate a prediction y hat M and so in order to write this the activation function notation as well I'm going to write this as a two square bracket round bracket 1 this is a 2 2 and a 2 m so this notation a square bracket 2 round bracket I the round bracket I refers to training example I and the square bracket 2 refers to layer 2 ok so that's how the square bracket and the round bracket industries work and so the suggest that if you have an unvectorized implementation and want to compute the prediction for all your training examples you need to do for I equals 1 to em on there basically intimate these four equations where you need I guess z1 I equals 31 X I plus B 1 on a 1 I equals sigmoid z 1i z 2 I equals W 2 a 1 I plus V 2 and a 2 I equals sigmoid of z2 I right so it's basically you know these four equations on top and adding the superscript round bracket I to all the variables that depend on the training example so adding those superstream round bracket I to X Z and a if you want to compute all the outputs on your M training examples what we like to do is vectorize this whole computation so it's to get rid of this volume and by the way in case it seems like I'm getting a little more of nitty-gritty linear algebra it turns out that being able to implement this correctly is important in the deep learning error and we actually chose the notation very carefully for this class to make these vectorization as as easy as possible so I hope that great through this nitty-gritty will actually help you to more quickly get your correct implementations of these advents working all right so let me just copy this whole block of code to the next slide and then we'll see how to vectorize this so here's we had from the previous line with a four group going over all M training examples so recall that we define the matrix X to be equal to our training examples stacked up on these columns like so so take the training examples stack them in columns so this becomes a n or maybe NX by dimensional matrix I'm just going to give away the punchline and tell you what you need to implement in order to vectorize implementation of this for loop turns out what you need to do is compute capital Z 1 equals W 1 X plus B 1 capital a1 equals sigmoid of z1 then tap code Z 2 equals W 2 times a 1 plus B 2 and then a2 equals sigmoid of Z 2 so if you want the analogy is that we went from lowercase vector X s to this capital case X matrix by stacking up the lower case X's in different columns if you do the same thing for the Z's so for example if you take Z 1 1 z 1 2 and so on these are all column vectors up to Z 1 m right so that's this first quantity but all M of them and stack them in columns then this gives you the matrix Z 1 and similarly if you look at say this quantity you take a 1 1 a 1 2 and so on in a 1m and stack them up in columns then this just as we went from lower case X to capital case X and lo que si to Catholic 8z this goes from the lower case a which are vectors to do some capital A 1 that's over there and similarly for Z 2 and a 2 right there also attained by taking these vectors and stacking them horizontally and taking these vectors and stacking them horizontally in order to get Z Capital Z 2 and capital e 2 one of the property of this notation that might help you to think about it is that these matrices say Z and a horizontally we're going to index across training examples so that's why the horizontal index you know corresponds to different training examples is sweep from left to right you're scanning through the training set and vertically this vertical index corresponds to different notes in the neural network so for example this note this value at the topmost topmost corner of the matrix corresponds to the activation of the first hidden unit on the first training example on one value down corresponds to the activation in the second hidden unit on the first training example then the third heading unit on the first training example and so on so as you scan down this is new indexing into the hidden units number where as you do with horizontally then you're going from the first hidden unit in the first training example to you now the first in the human second training example the third turn example and so on until this note here corresponds to the activation of the first hidden unit in the final training example in the M training example ok so the horizontal the the matrix a goes over a different training examples and vertically the different indices in the matrix a corresponds to different hidden units and a similar intuition holds true for the matrix Z as well as well as for X where horizontally it corresponds to different training examples and vertically it corresponds to different features different input features which are really different notes in phileo of the neural network so with these equations you now know how to implement a neural network with vectorization that is vectorization across multiple examples in the next video I want to show you a bit more justification about why this is a correct implementation of this type of vectorization it turns out the justification will be similar to whether you have seen for logistic regression let's go on to the next video

Original Description

Take the Deep Learning Specialization: http://bit.ly/2IfZoml Check out all our courses: https://www.deeplearning.ai Subscribe to The Batch, our weekly newsletter: https://www.deeplearning.ai/thebatch Follow us: Twitter: https://twitter.com/deeplearningai_ Facebook: https://www.facebook.com/deeplearningHQ/ Linkedin: https://www.linkedin.com/company/deeplearningai

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DeepLearningAI · DeepLearningAI · 50 of 60

← Previous Next →

Forward and Backward Propagation (C1W4L06)

Forward and Backward Propagation (C1W4L06)

deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin

deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin

deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov

deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov

deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio

deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio

deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel

deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel

deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow

deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow

deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy

deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy

Using an Appropriate Scale (C2W3L02)

Using an Appropriate Scale (C2W3L02)

Gradient Checking (C2W1L13)

Gradient Checking (C2W1L13)

Gradient Checking Implementation Notes (C2W1L14)

Gradient Checking Implementation Notes (C2W1L14)

Learning Rate Decay (C2W2L09)

Learning Rate Decay (C2W2L09)

Understanding Mini-Batch Gradient Dexcent (C2W2L02)

Understanding Mini-Batch Gradient Dexcent (C2W2L02)

Mini Batch Gradient Descent (C2W2L01)

Mini Batch Gradient Descent (C2W2L01)

The Problem of Local Optima (C2W3L10)

The Problem of Local Optima (C2W3L10)

Exponentially Weighted Averages (C2W2L03)

Exponentially Weighted Averages (C2W2L03)

Tuning Process (C2W3L01)

Tuning Process (C2W3L01)

Understanding Exponentially Weighted Averages (C2W2L04)

Understanding Exponentially Weighted Averages (C2W2L04)

Bias Correction of Exponentially Weighted Averages (C2W2L05)

Bias Correction of Exponentially Weighted Averages (C2W2L05)

Gradient Descent With Momentum (C2W2L06)

Gradient Descent With Momentum (C2W2L06)

Normalizing Activations in a Network (C2W3L04)

Normalizing Activations in a Network (C2W3L04)

Hyperparameter Tuning in Practice (C2W3L03)

Hyperparameter Tuning in Practice (C2W3L03)

Adam Optimization Algorithm (C2W2L08)

Adam Optimization Algorithm (C2W2L08)

RMSProp (C2W2L07)

RMSProp (C2W2L07)

Fitting Batch Norm Into Neural Networks (C2W3L05)

Fitting Batch Norm Into Neural Networks (C2W3L05)

Why Does Batch Norm Work? (C2W3L06)

Why Does Batch Norm Work? (C2W3L06)

Batch Norm At Test Time (C2W3L07)

Batch Norm At Test Time (C2W3L07)

Softmax Regression (C2W3L08)

Softmax Regression (C2W3L08)

Deep Learning Frameworks (C2W3L10)

Deep Learning Frameworks (C2W3L10)

Neural Network Overview (C1W3L01)

Neural Network Overview (C1W3L01)

Training Softmax Classifier (C2W3L09)

Training Softmax Classifier (C2W3L09)

Why Deep Representations? (C1W4L04)

Why Deep Representations? (C1W4L04)

Gradient Descent For Neural Networks (C1W3L09)

Gradient Descent For Neural Networks (C1W3L09)

Neural Network Representations (C1W3L02)

Neural Network Representations (C1W3L02)

TensorFlow (C2W3L11)

TensorFlow (C2W3L11)

Activation Functions (C1W3L06)

Activation Functions (C1W3L06)

Explanation For Vectorized Implementation (C1W3L05)

Explanation For Vectorized Implementation (C1W3L05)

Getting Matrix Dimensions Right (C1W4L03)

Getting Matrix Dimensions Right (C1W4L03)

Understanding Dropout (C2W1L07)

Understanding Dropout (C2W1L07)

Building Blocks of a Deep Neural Network (C1W4L05)

Building Blocks of a Deep Neural Network (C1W4L05)

Why Non-linear Activation Functions (C1W3L07)

Why Non-linear Activation Functions (C1W3L07)

Computing Neural Network Output (C1W3L03)

Computing Neural Network Output (C1W3L03)

Backpropagation Intuition (C1W3L10)

Backpropagation Intuition (C1W3L10)

Train/Dev/Test Sets (C2W1L01)

Train/Dev/Test Sets (C2W1L01)

Deep L-Layer Neural Network (C1W4L01)

Deep L-Layer Neural Network (C1W4L01)

Random Initialization (C1W3L11)

Random Initialization (C1W3L11)

Other Regularization Methods (C2W1L08)

Other Regularization Methods (C2W1L08)

Normalizing Inputs (C2W1L09)

Normalizing Inputs (C2W1L09)

Derivatives Of Activation Functions (C1W3L08)

Derivatives Of Activation Functions (C1W3L08)

Parameters vs Hyperparameters (C1W4L07)

Parameters vs Hyperparameters (C1W4L07)

Vectorizing Across Multiple Examples (C1W3L04)

Vectorizing Across Multiple Examples (C1W3L04)

What does this have to do with the brain? (C1W4L08)

What does this have to do with the brain? (C1W4L08)

Dropout Regularization (C2W1L06)

Dropout Regularization (C2W1L06)

Vanishing/Exploding Gradients (C2W1L10)

Vanishing/Exploding Gradients (C2W1L10)

Basic Recipe for Machine Learning (C2W1L03)

Basic Recipe for Machine Learning (C2W1L03)

Bias/Variance (C2W1L02)

Bias/Variance (C2W1L02)

Forward Propagation in a Deep Network (C1W4L02)

Forward Propagation in a Deep Network (C1W4L02)

Weight Initialization in a Deep Network (C2W1L11)

Weight Initialization in a Deep Network (C2W1L11)

Numerical Approximations of Gradients (C2W1L12)

Numerical Approximations of Gradients (C2W1L12)

Regularization (C2W1L04)

Regularization (C2W1L04)

Why Regularization Reduces Overfitting (C2W1L05)

Why Regularization Reduces Overfitting (C2W1L05)

This video teaches how to vectorize across multiple training examples in a neural network, allowing for efficient computation of outputs for all examples at once. The process involves modifying equations from the previous video to use matrix operations and vectorized implementations.

Key Takeaways

Stack training examples in columns of a matrix
Modify equations to compute outputs for all examples at once
Use vectorized implementations of the equations
Compute capital Z1, capital A1, capital Z2, and capital A2

💡 Vectorization across multiple examples allows for efficient computation of outputs for all examples at once, making it a crucial concept in deep learning.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related Reads

When Should AI Teams Replace a Model in Production?

Learn when to replace an AI model in production based on workflow and data analysis

Dev.to · Ye Allen

Stop Writing Python Classes Until You Learn The 4 Things You Can Do To Every Piece Of Data An…

Learn to manipulate data in Python objects by understanding 4 essential operations to improve your coding skills

Medium · Programming

Top 10 AI Evaluation Interview Questions and Answers

Learn to answer top AI evaluation interview questions and understand their importance

Medium · Machine Learning

We took highlight detection from 0.56 to 0.86 — with zero new footage and zero cloud training

Improve highlight detection in videos from 0.56 to 0.86 accuracy without new footage or cloud training by applying data-driven measurement and optimization techniques

1. Overview of Artificial Intelligence | What is AI? Fundamental Concepts & Complete History of AI

Professor Rahul Jain