Why Deep Representations? (C1W4L04)

DeepLearningAI · Beginner ·📐 ML Fundamentals ·8y ago

Skills: ML Maths Basics80%Supervised Learning60%

Key Takeaways

The video discusses the importance of deep representations in neural networks, highlighting how they can learn complex functions by composing simpler ones, and how this applies to various data types such as images and speech recognition. It also touches on the benefits of deep networks over shallow ones, including their ability to compute certain mathematical functions more easily.

Full Transcript

we've all been hearing that deep neural networks work really well for a lot of problems it's not just that they need to be big neural networks is that specifically they need to be deep or to have a lot of hidden layers so why is that let's go for a couple examples and try to gain some intuition for why deep networks might work well so first what is a deep network computing if you're building a system for face recognition or face detection here's what the deep neural network could be doing perhaps you input a picture of a face then the first layer of the neural network you can think of as maybe being a feature detector or an edge detector in this example I'm plotting what a neural network with maybe twenty hidden units might be trying to compute on this image with the twenty hidden units visualized by these little square boxes so for example this little visualization represents a hidden unit that's trying to figure out if you know where the edges of that orientation are in the image and maybe this hidden unit might be trying to figure out where are the horizontal edges in this image and when we talk about convolutional networks in a later course of this particular visualization we'll make a bit more sense but the form you can think of the first lived in your network as looking a picture and trying to figure out you know where are the edges in this picture now let's figure out where the edges in this picture by grouping together pixels to form edges it can then take the detected edges and group edges together to form parts of faces so for example you might have a loner on trying to see if is finding an eye or a different neuron trying to find that part of the nose and so by putting together lots of edges it can start to detect different parts of faces and then finally by putting together different parts of faces that can I or a nose or an ear or chin it can then try to recognize or detect different types of faces so intuitively you can think of the earlier layers of a neural network is detecting simpler functions like edges then composing them together in the later layers of a neural network so that they can learn one more complex functions these visualizations will make more sense when we talk about convolutional nets and one technical detail of this visualization the edge detectors are looking in relatively small areas of an image may be very small regions like that and then the facial detectors you know can look at may be much larger areas of the image but the main intuition when you take away from this is just finding simpler things like edges and then building them up composing them together to detect more complex things like an iron there was in the composing those together to find even more complex things and this type of simple to complex hierarchical representation or compositional representation applies in other types of data than images and and face recognition as well for example if you're trying to build a speech recognition system it's hard to do visualise speech but if you input an audio clip there may be the first level of a neural network might learn to detect you know low level audio waveform features such as is this tone going up is this going down is it a white noise or sibilant sound lights right and what is the pitch but it can select to type low level waveform features like that and then by composing low level waveforms maybe of learn to detect basic units of sound so in linguistics they called phonemes but for example in the word cat the cup is a phoneme that up is a phoneme the term is another phoneme but learns to find maybe the basic units of sound and then composing that together maybe you learn to recognize words in the audio and then maybe compose those together in order to recognize entire you know phrases or sentences so deep neural network with multiple hidden layers might be able to have the earlier layers learn these lower levels simpler features and then have the later deeper layers then put together the simpler things is detected in order to detect more complex things like recognize specific words or even phrases or sentences that you serving in order to carry-out speech recognition and what we see is that whereas the earlier layers are computing what seems like relatively simple functions of the input such as we're at the edges by the time you get deep in the network you can actually do you know surprisingly complex things such as detect faces or detect words or phrases or sentences some people like to make an analogy between deep neural networks and the human brain where we believe on neuroscientists believe that the human brain also starts off detecting simple things like edges in what your eyes see and then builds those up to detect more complex things like the faces that you see I think analogies between deep learning and the human brain are sometimes a little bit dangerous but you know there is a lot of truth to this being how we think the human brain works and that the human brain probably detects simple things like edges first and then puts them together to form more and more complex objects and so that has served as a loose form of inspiration for some deep learning as well we'll say a bit more about the human brain or about the biological brain in a later video this week the other piece of intuition about why deep networks seems to work well is the following so this result comes from circuit theory which pertains to thinking about what types of functions you can compute with different hand gates and or gates and not gates bassy logic gates so informally their functions in computer were viral ative Li small but deep neural network and by small I mean the number of hidden units is relatively small but that if you try to compute the same function with a shallow network so if you aren't allowed enough hidden layers then you might require exponentially more hidden units to compute so let me just give you one example and illustrate this a bit informally but let's say you're trying to compute the exclusive-or or the parity of all your input features you can't compute X 1 X 4 X 2 X 4 X 3 X or up to UM it and if you have n or NX features so if you build an X or tree like this right so first compute the XOR of X 1 the next two then take X 3 and X 4 and compute their XOR and technically if you're just using and or not gate you might need a couple layers to compute the XOR function rather than just one layer but with a relatively small circuit you can compute the XOR right and so on and then you can you know build really an X or tree like so until eventually you have a circuit here that outputs you know the all let's call this Y that outputs y hat equals y the exclusive or the parity of all of these input bits so the compute the XOR the depth of the network will be on the order of log n right when this type of XOR tree so the number of nodes and the number of circuit circuit components or the number of gates in this network is not that large you don't need that many gates in order to compute the exclusive-or but now if you're not allowed to use a new network with multiple hidden layers with in this case order log and hidden layers if you're forced to compute this function with just one hidden layer right so you have all these things going into you know sort of hidden units and then these things then outputs Y then in order to compute the parity of X to compute this XOR function this hidden layer will need to be exponentially large because essentially you need to exhaustively enumerate all 2 to the N possible configurations so on the order of 2 to the N possible configurations of the input bits that result in the exclusive or being either zero so you end up needing a hidden layer that is exponentially large in the number of bits I think technically you could do this we have 2 to the N minus 1 hidden units right but that's the order 2 to the N is gonna be exponentially large in the number of bits so hope this gives a sense that there are mathematical functions that are much easier to compute with deep networks than with shallow networks I have to admit I personally found the result from circuit theory less useful for gaining intuitions but this is one of the results that people often cite when just when explaining the value of having very deep representations now in addition to these reasons for preferring deep neural networks to be perfectly honest I think the other reason the term term deep learning has taken off it's just branding right these things used to be called neural networks above all of hidden layers but the phrase deep learning you know it's just a great brand it just is so deep right so I think that once that term called on that really neuro networks rebranded or new networks with many hidden layers rebranded helped to capture the popular imagination as well but regardless of the PR branding deep networks do work well sometimes people go overboard and insist on using tons of hidden layers but when I'm starting on a new problem I often really start out with even logistic regression and try something with one or two hidden layers and use that as a hyper parameter you said as a parameter or hyper parameter that you tune in order to try to find the right therefore your neural network but over the last several years there has been a trend toward people finding that for some applications very very deep neural networks you know maybe many dozens of layers sometimes can sometimes be the best model for a problem so that's it for the intuitions for why deep learning seems to work well let's now take a look at the mechanics of how to implement not just for propagation but also back propagation

Original Description

Take the Deep Learning Specialization: http://bit.ly/32Iw01H Check out all our courses: https://www.deeplearning.ai Subscribe to The Batch, our weekly newsletter: https://www.deeplearning.ai/thebatch Follow us: Twitter: https://twitter.com/deeplearningai_ Facebook: https://www.facebook.com/deeplearningHQ/ Linkedin: https://www.linkedin.com/company/deeplearningai

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DeepLearningAI · DeepLearningAI · 31 of 60

← Previous Next →

Forward and Backward Propagation (C1W4L06)

Forward and Backward Propagation (C1W4L06)

deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin

deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin

deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov

deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov

deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio

deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio

deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel

deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel

deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow

deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow

deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy

deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy

Using an Appropriate Scale (C2W3L02)

Using an Appropriate Scale (C2W3L02)

Gradient Checking (C2W1L13)

Gradient Checking (C2W1L13)

Gradient Checking Implementation Notes (C2W1L14)

Gradient Checking Implementation Notes (C2W1L14)

Learning Rate Decay (C2W2L09)

Learning Rate Decay (C2W2L09)

Understanding Mini-Batch Gradient Dexcent (C2W2L02)

Understanding Mini-Batch Gradient Dexcent (C2W2L02)

Mini Batch Gradient Descent (C2W2L01)

Mini Batch Gradient Descent (C2W2L01)

The Problem of Local Optima (C2W3L10)

The Problem of Local Optima (C2W3L10)

Exponentially Weighted Averages (C2W2L03)

Exponentially Weighted Averages (C2W2L03)

Tuning Process (C2W3L01)

Tuning Process (C2W3L01)

Understanding Exponentially Weighted Averages (C2W2L04)

Understanding Exponentially Weighted Averages (C2W2L04)

Bias Correction of Exponentially Weighted Averages (C2W2L05)

Bias Correction of Exponentially Weighted Averages (C2W2L05)

Gradient Descent With Momentum (C2W2L06)

Gradient Descent With Momentum (C2W2L06)

Normalizing Activations in a Network (C2W3L04)

Normalizing Activations in a Network (C2W3L04)

Hyperparameter Tuning in Practice (C2W3L03)

Hyperparameter Tuning in Practice (C2W3L03)

Adam Optimization Algorithm (C2W2L08)

Adam Optimization Algorithm (C2W2L08)

RMSProp (C2W2L07)

RMSProp (C2W2L07)

Fitting Batch Norm Into Neural Networks (C2W3L05)

Fitting Batch Norm Into Neural Networks (C2W3L05)

Why Does Batch Norm Work? (C2W3L06)

Why Does Batch Norm Work? (C2W3L06)

Batch Norm At Test Time (C2W3L07)

Batch Norm At Test Time (C2W3L07)

Softmax Regression (C2W3L08)

Softmax Regression (C2W3L08)

Deep Learning Frameworks (C2W3L10)

Deep Learning Frameworks (C2W3L10)

Neural Network Overview (C1W3L01)

Neural Network Overview (C1W3L01)

Training Softmax Classifier (C2W3L09)

Training Softmax Classifier (C2W3L09)

Why Deep Representations? (C1W4L04)

Why Deep Representations? (C1W4L04)

Gradient Descent For Neural Networks (C1W3L09)

Gradient Descent For Neural Networks (C1W3L09)

Neural Network Representations (C1W3L02)

Neural Network Representations (C1W3L02)

TensorFlow (C2W3L11)

TensorFlow (C2W3L11)

Activation Functions (C1W3L06)

Activation Functions (C1W3L06)

Explanation For Vectorized Implementation (C1W3L05)

Explanation For Vectorized Implementation (C1W3L05)

Getting Matrix Dimensions Right (C1W4L03)

Getting Matrix Dimensions Right (C1W4L03)

Understanding Dropout (C2W1L07)

Understanding Dropout (C2W1L07)

Building Blocks of a Deep Neural Network (C1W4L05)

Building Blocks of a Deep Neural Network (C1W4L05)

Why Non-linear Activation Functions (C1W3L07)

Why Non-linear Activation Functions (C1W3L07)

Computing Neural Network Output (C1W3L03)

Computing Neural Network Output (C1W3L03)

Backpropagation Intuition (C1W3L10)

Backpropagation Intuition (C1W3L10)

Train/Dev/Test Sets (C2W1L01)

Train/Dev/Test Sets (C2W1L01)

Deep L-Layer Neural Network (C1W4L01)

Deep L-Layer Neural Network (C1W4L01)

Random Initialization (C1W3L11)

Random Initialization (C1W3L11)

Other Regularization Methods (C2W1L08)

Other Regularization Methods (C2W1L08)

Normalizing Inputs (C2W1L09)

Normalizing Inputs (C2W1L09)

Derivatives Of Activation Functions (C1W3L08)

Derivatives Of Activation Functions (C1W3L08)

Parameters vs Hyperparameters (C1W4L07)

Parameters vs Hyperparameters (C1W4L07)

Vectorizing Across Multiple Examples (C1W3L04)

Vectorizing Across Multiple Examples (C1W3L04)

What does this have to do with the brain? (C1W4L08)

What does this have to do with the brain? (C1W4L08)

Dropout Regularization (C2W1L06)

Dropout Regularization (C2W1L06)

Vanishing/Exploding Gradients (C2W1L10)

Vanishing/Exploding Gradients (C2W1L10)

Basic Recipe for Machine Learning (C2W1L03)

Basic Recipe for Machine Learning (C2W1L03)

Bias/Variance (C2W1L02)

Bias/Variance (C2W1L02)

Forward Propagation in a Deep Network (C1W4L02)

Forward Propagation in a Deep Network (C1W4L02)

Weight Initialization in a Deep Network (C2W1L11)

Weight Initialization in a Deep Network (C2W1L11)

Numerical Approximations of Gradients (C2W1L12)

Numerical Approximations of Gradients (C2W1L12)

Regularization (C2W1L04)

Regularization (C2W1L04)

Why Regularization Reduces Overfitting (C2W1L05)

Why Regularization Reduces Overfitting (C2W1L05)

This video teaches the importance of deep representations in neural networks and how they can be used to learn complex functions. It also covers the benefits of deep networks over shallow ones and introduces key concepts such as back propagation and representation learning. By watching this video, viewers can gain a deeper understanding of deep learning fundamentals and how to apply them to real-world problems.

Key Takeaways

Understand the concept of hierarchical representation
Learn how deep neural networks can compose simpler functions to learn complex ones
Apply mathematical concepts to neural networks
Train neural networks on labeled data
Build basic language models

💡 Deep neural networks can compute certain mathematical functions much easier than shallow networks, making them a powerful tool for complex problem-solving.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related Reads

Evolving Algorithms: Next-Generation AI in Predictive Analytics

Learn how next-generation AI is transforming predictive analytics with evolving algorithms and why it matters for informed decision-making

Dev.to · Fu'ad Husnan

Architecting for the Future: A Blueprint for Model-Agnostic, Business-Ready AI

Learn to architect model-agnostic, business-ready AI systems with a standardized infrastructure

The Recommender System Pipeline: An End-to-End Overview

Learn the end-to-end pipeline of recommender systems and how they filter information for users

The Recommender System Pipeline: An End-to-End Overview

Learn how to build a recommender system pipeline from data collection to model deployment and understand its key components

Medium · Machine Learning

1. Overview of Artificial Intelligence | What is AI? Fundamental Concepts & Complete History of AI

Professor Rahul Jain