Why Deep Representations? (C1W4L04)

DeepLearningAI · Beginner ·📐 ML Fundamentals ·8y ago

Key Takeaways

The video discusses the importance of deep representations in neural networks, highlighting how they can learn complex functions by composing simpler ones, and how this applies to various data types such as images and speech recognition. It also touches on the benefits of deep networks over shallow ones, including their ability to compute certain mathematical functions more easily.

Full Transcript

we've all been hearing that deep neural networks work really well for a lot of problems it's not just that they need to be big neural networks is that specifically they need to be deep or to have a lot of hidden layers so why is that let's go for a couple examples and try to gain some intuition for why deep networks might work well so first what is a deep network computing if you're building a system for face recognition or face detection here's what the deep neural network could be doing perhaps you input a picture of a face then the first layer of the neural network you can think of as maybe being a feature detector or an edge detector in this example I'm plotting what a neural network with maybe twenty hidden units might be trying to compute on this image with the twenty hidden units visualized by these little square boxes so for example this little visualization represents a hidden unit that's trying to figure out if you know where the edges of that orientation are in the image and maybe this hidden unit might be trying to figure out where are the horizontal edges in this image and when we talk about convolutional networks in a later course of this particular visualization we'll make a bit more sense but the form you can think of the first lived in your network as looking a picture and trying to figure out you know where are the edges in this picture now let's figure out where the edges in this picture by grouping together pixels to form edges it can then take the detected edges and group edges together to form parts of faces so for example you might have a loner on trying to see if is finding an eye or a different neuron trying to find that part of the nose and so by putting together lots of edges it can start to detect different parts of faces and then finally by putting together different parts of faces that can I or a nose or an ear or chin it can then try to recognize or detect different types of faces so intuitively you can think of the earlier layers of a neural network is detecting simpler functions like edges then composing them together in the later layers of a neural network so that they can learn one more complex functions these visualizations will make more sense when we talk about convolutional nets and one technical detail of this visualization the edge detectors are looking in relatively small areas of an image may be very small regions like that and then the facial detectors you know can look at may be much larger areas of the image but the main intuition when you take away from this is just finding simpler things like edges and then building them up composing them together to detect more complex things like an iron there was in the composing those together to find even more complex things and this type of simple to complex hierarchical representation or compositional representation applies in other types of data than images and and face recognition as well for example if you're trying to build a speech recognition system it's hard to do visualise speech but if you input an audio clip there may be the first level of a neural network might learn to detect you know low level audio waveform features such as is this tone going up is this going down is it a white noise or sibilant sound lights right and what is the pitch but it can select to type low level waveform features like that and then by composing low level waveforms maybe of learn to detect basic units of sound so in linguistics they called phonemes but for example in the word cat the cup is a phoneme that up is a phoneme the term is another phoneme but learns to find maybe the basic units of sound and then composing that together maybe you learn to recognize words in the audio and then maybe compose those together in order to recognize entire you know phrases or sentences so deep neural network with multiple hidden layers might be able to have the earlier layers learn these lower levels simpler features and then have the later deeper layers then put together the simpler things is detected in order to detect more complex things like recognize specific words or even phrases or sentences that you serving in order to carry-out speech recognition and what we see is that whereas the earlier layers are computing what seems like relatively simple functions of the input such as we're at the edges by the time you get deep in the network you can actually do you know surprisingly complex things such as detect faces or detect words or phrases or sentences some people like to make an analogy between deep neural networks and the human brain where we believe on neuroscientists believe that the human brain also starts off detecting simple things like edges in what your eyes see and then builds those up to detect more complex things like the faces that you see I think analogies between deep learning and the human brain are sometimes a little bit dangerous but you know there is a lot of truth to this being how we think the human brain works and that the human brain probably detects simple things like edges first and then puts them together to form more and more complex objects and so that has served as a loose form of inspiration for some deep learning as well we'll say a bit more about the human brain or about the biological brain in a later video this week the other piece of intuition about why deep networks seems to work well is the following so this result comes from circuit theory which pertains to thinking about what types of functions you can compute with different hand gates and or gates and not gates bassy logic gates so informally their functions in computer were viral ative Li small but deep neural network and by small I mean the number of hidden units is relatively small but that if you try to compute the same function with a shallow network so if you aren't allowed enough hidden layers then you might require exponentially more hidden units to compute so let me just give you one example and illustrate this a bit informally but let's say you're trying to compute the exclusive-or or the parity of all your input features you can't compute X 1 X 4 X 2 X 4 X 3 X or up to UM it and if you have n or NX features so if you build an X or tree like this right so first compute the XOR of X 1 the next two then take X 3 and X 4 and compute their XOR and technically if you're just using and or not gate you might need a couple layers to compute the XOR function rather than just one layer but with a relatively small circuit you can compute the XOR right and so on and then you can you know build really an X or tree like so until eventually you have a circuit here that outputs you know the all let's call this Y that outputs y hat equals y the exclusive or the parity of all of these input bits so the compute the XOR the depth of the network will be on the order of log n right when this type of XOR tree so the number of nodes and the number of circuit circuit components or the number of gates in this network is not that large you don't need that many gates in order to compute the exclusive-or but now if you're not allowed to use a new network with multiple hidden layers with in this case order log and hidden layers if you're forced to compute this function with just one hidden layer right so you have all these things going into you know sort of hidden units and then these things then outputs Y then in order to compute the parity of X to compute this XOR function this hidden layer will need to be exponentially large because essentially you need to exhaustively enumerate all 2 to the N possible configurations so on the order of 2 to the N possible configurations of the input bits that result in the exclusive or being either zero so you end up needing a hidden layer that is exponentially large in the number of bits I think technically you could do this we have 2 to the N minus 1 hidden units right but that's the order 2 to the N is gonna be exponentially large in the number of bits so hope this gives a sense that there are mathematical functions that are much easier to compute with deep networks than with shallow networks I have to admit I personally found the result from circuit theory less useful for gaining intuitions but this is one of the results that people often cite when just when explaining the value of having very deep representations now in addition to these reasons for preferring deep neural networks to be perfectly honest I think the other reason the term term deep learning has taken off it's just branding right these things used to be called neural networks above all of hidden layers but the phrase deep learning you know it's just a great brand it just is so deep right so I think that once that term called on that really neuro networks rebranded or new networks with many hidden layers rebranded helped to capture the popular imagination as well but regardless of the PR branding deep networks do work well sometimes people go overboard and insist on using tons of hidden layers but when I'm starting on a new problem I often really start out with even logistic regression and try something with one or two hidden layers and use that as a hyper parameter you said as a parameter or hyper parameter that you tune in order to try to find the right therefore your neural network but over the last several years there has been a trend toward people finding that for some applications very very deep neural networks you know maybe many dozens of layers sometimes can sometimes be the best model for a problem so that's it for the intuitions for why deep learning seems to work well let's now take a look at the mechanics of how to implement not just for propagation but also back propagation

Original Description

Take the Deep Learning Specialization: http://bit.ly/32Iw01H Check out all our courses: https://www.deeplearning.ai Subscribe to The Batch, our weekly newsletter: https://www.deeplearning.ai/thebatch Follow us: Twitter: https://twitter.com/deeplearningai_ Facebook: https://www.facebook.com/deeplearningHQ/ Linkedin: https://www.linkedin.com/company/deeplearningai
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from DeepLearningAI · DeepLearningAI · 31 of 60

1 Forward and Backward Propagation (C1W4L06)
Forward and Backward Propagation (C1W4L06)
DeepLearningAI
2 deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin
deeplearning.ai's Heroes of Deep Learning: Yuanqing Lin
DeepLearningAI
3 deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov
deeplearning.ai's Heroes of Deep Learning: Ruslan Salakhutdinov
DeepLearningAI
4 deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio
deeplearning.ai's Heroes of Deep Learning: Yoshua Bengio
DeepLearningAI
5 deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel
deeplearning.ai's Heroes of Deep Learning: Pieter Abbeel
DeepLearningAI
6 deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow
deeplearning.ai's Heroes of Deep Learning: Ian Goodfellow
DeepLearningAI
7 deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy
deeplearning.ai's Heroes of Deep Learning: Andrej Karpathy
DeepLearningAI
8 Using an Appropriate Scale (C2W3L02)
Using an Appropriate Scale (C2W3L02)
DeepLearningAI
9 Gradient Checking (C2W1L13)
Gradient Checking (C2W1L13)
DeepLearningAI
10 Gradient Checking Implementation Notes (C2W1L14)
Gradient Checking Implementation Notes (C2W1L14)
DeepLearningAI
11 Learning Rate Decay (C2W2L09)
Learning Rate Decay (C2W2L09)
DeepLearningAI
12 Understanding Mini-Batch Gradient Dexcent (C2W2L02)
Understanding Mini-Batch Gradient Dexcent (C2W2L02)
DeepLearningAI
13 Mini Batch Gradient Descent (C2W2L01)
Mini Batch Gradient Descent (C2W2L01)
DeepLearningAI
14 The Problem of Local Optima (C2W3L10)
The Problem of Local Optima (C2W3L10)
DeepLearningAI
15 Exponentially Weighted Averages (C2W2L03)
Exponentially Weighted Averages (C2W2L03)
DeepLearningAI
16 Tuning Process (C2W3L01)
Tuning Process (C2W3L01)
DeepLearningAI
17 Understanding Exponentially Weighted Averages (C2W2L04)
Understanding Exponentially Weighted Averages (C2W2L04)
DeepLearningAI
18 Bias Correction of Exponentially Weighted Averages (C2W2L05)
Bias Correction of Exponentially Weighted Averages (C2W2L05)
DeepLearningAI
19 Gradient Descent With Momentum (C2W2L06)
Gradient Descent With Momentum (C2W2L06)
DeepLearningAI
20 Normalizing Activations in a Network (C2W3L04)
Normalizing Activations in a Network (C2W3L04)
DeepLearningAI
21 Hyperparameter Tuning in Practice (C2W3L03)
Hyperparameter Tuning in Practice (C2W3L03)
DeepLearningAI
22 Adam Optimization Algorithm (C2W2L08)
Adam Optimization Algorithm (C2W2L08)
DeepLearningAI
23 RMSProp (C2W2L07)
RMSProp (C2W2L07)
DeepLearningAI
24 Fitting Batch Norm Into Neural Networks (C2W3L05)
Fitting Batch Norm Into Neural Networks (C2W3L05)
DeepLearningAI
25 Why Does Batch Norm Work? (C2W3L06)
Why Does Batch Norm Work? (C2W3L06)
DeepLearningAI
26 Batch Norm At Test Time (C2W3L07)
Batch Norm At Test Time (C2W3L07)
DeepLearningAI
27 Softmax Regression (C2W3L08)
Softmax Regression (C2W3L08)
DeepLearningAI
28 Deep Learning Frameworks (C2W3L10)
Deep Learning Frameworks (C2W3L10)
DeepLearningAI
29 Neural Network Overview (C1W3L01)
Neural Network Overview (C1W3L01)
DeepLearningAI
30 Training Softmax Classifier (C2W3L09)
Training Softmax Classifier (C2W3L09)
DeepLearningAI
Why Deep Representations? (C1W4L04)
Why Deep Representations? (C1W4L04)
DeepLearningAI
32 Gradient Descent For Neural Networks (C1W3L09)
Gradient Descent For Neural Networks (C1W3L09)
DeepLearningAI
33 Neural Network Representations (C1W3L02)
Neural Network Representations (C1W3L02)
DeepLearningAI
34 TensorFlow (C2W3L11)
TensorFlow (C2W3L11)
DeepLearningAI
35 Activation Functions (C1W3L06)
Activation Functions (C1W3L06)
DeepLearningAI
36 Explanation For Vectorized Implementation (C1W3L05)
Explanation For Vectorized Implementation (C1W3L05)
DeepLearningAI
37 Getting Matrix Dimensions Right (C1W4L03)
Getting Matrix Dimensions Right (C1W4L03)
DeepLearningAI
38 Understanding Dropout (C2W1L07)
Understanding Dropout (C2W1L07)
DeepLearningAI
39 Building Blocks of a Deep Neural Network (C1W4L05)
Building Blocks of a Deep Neural Network (C1W4L05)
DeepLearningAI
40 Why Non-linear Activation Functions (C1W3L07)
Why Non-linear Activation Functions (C1W3L07)
DeepLearningAI
41 Computing Neural Network Output (C1W3L03)
Computing Neural Network Output (C1W3L03)
DeepLearningAI
42 Backpropagation Intuition (C1W3L10)
Backpropagation Intuition (C1W3L10)
DeepLearningAI
43 Train/Dev/Test Sets (C2W1L01)
Train/Dev/Test Sets (C2W1L01)
DeepLearningAI
44 Deep L-Layer Neural Network (C1W4L01)
Deep L-Layer Neural Network (C1W4L01)
DeepLearningAI
45 Random Initialization (C1W3L11)
Random Initialization (C1W3L11)
DeepLearningAI
46 Other Regularization Methods (C2W1L08)
Other Regularization Methods (C2W1L08)
DeepLearningAI
47 Normalizing Inputs (C2W1L09)
Normalizing Inputs (C2W1L09)
DeepLearningAI
48 Derivatives Of Activation Functions (C1W3L08)
Derivatives Of Activation Functions (C1W3L08)
DeepLearningAI
49 Parameters vs Hyperparameters (C1W4L07)
Parameters vs Hyperparameters (C1W4L07)
DeepLearningAI
50 Vectorizing Across Multiple Examples (C1W3L04)
Vectorizing Across Multiple Examples (C1W3L04)
DeepLearningAI
51 What does this have to do with the brain? (C1W4L08)
What does this have to do with the brain? (C1W4L08)
DeepLearningAI
52 Dropout Regularization (C2W1L06)
Dropout Regularization (C2W1L06)
DeepLearningAI
53 Vanishing/Exploding Gradients (C2W1L10)
Vanishing/Exploding Gradients (C2W1L10)
DeepLearningAI
54 Basic Recipe for Machine Learning (C2W1L03)
Basic Recipe for Machine Learning (C2W1L03)
DeepLearningAI
55 Bias/Variance (C2W1L02)
Bias/Variance (C2W1L02)
DeepLearningAI
56 Forward Propagation in a Deep Network (C1W4L02)
Forward Propagation in a Deep Network (C1W4L02)
DeepLearningAI
57 Weight Initialization in a Deep Network (C2W1L11)
Weight Initialization in a Deep Network (C2W1L11)
DeepLearningAI
58 Numerical Approximations of Gradients (C2W1L12)
Numerical Approximations of Gradients (C2W1L12)
DeepLearningAI
59 Regularization (C2W1L04)
Regularization (C2W1L04)
DeepLearningAI
60 Why Regularization Reduces Overfitting (C2W1L05)
Why Regularization Reduces Overfitting (C2W1L05)
DeepLearningAI

This video teaches the importance of deep representations in neural networks and how they can be used to learn complex functions. It also covers the benefits of deep networks over shallow ones and introduces key concepts such as back propagation and representation learning. By watching this video, viewers can gain a deeper understanding of deep learning fundamentals and how to apply them to real-world problems.

Key Takeaways
  1. Understand the concept of hierarchical representation
  2. Learn how deep neural networks can compose simpler functions to learn complex ones
  3. Apply mathematical concepts to neural networks
  4. Train neural networks on labeled data
  5. Build basic language models
💡 Deep neural networks can compute certain mathematical functions much easier than shallow networks, making them a powerful tool for complex problem-solving.

Related Reads

📰
Simplify model selection in Amazon Bedrock with the open source Model Profiler
Simplify model selection in Amazon Bedrock using the open source Model Profiler, aggregating metadata from multiple AWS APIs and external sources
AWS Machine Learning
📰
ChronoCast : The Time Series project
Learn about ChronoCast, a time series analysis project for understanding and learning, and how to apply its concepts to improve forecasting models
Medium · Machine Learning
📰
Beyond Price: Building an Ensemble Volatility Intelligence System for XAU/USD
Learn to build an ensemble volatility intelligence system for XAU/USD using GARCH, regime switching, Kalman filtering, and machine learning, to improve trading decisions
Medium · Machine Learning
📰
Gate on what the model can't author (my comment section redesigned my trust model)
Redesign your trust model by identifying features with external sources, as seen in a comment section discussion on an email classifier's scoring system
Dev.to AI
Up next
Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub
FAME WORLD EDUCATIONAL HUB
Watch →