Neural Architecture Search
Key Takeaways
This video discusses Neural Architecture Search (NAS) and its application in designing neural network layers using meta-learning and reinforcement learning, with tools such as Recurrent Neural Networks (RNNs), Scheduled Drop Path, and Proximal Policy Optimization.
Full Transcript
[Music] this video will explain neural architecture search neural architecture search belongs to a family of deep learning methods known as meta learning meta learning is the idea of using using an auxilary search algorithm such as random search manual search grid search evolutionary search or reinforcement learning in order to design the characteristics of a neural network these characteristic of the neural network can be on the surface level with things like learning rate betta terms of optimizers and then things like the number of filter maps activate and high-level decisions in neural architecture search the meta learning characteristics of the neural network are taking a step inside the network so some other examples of this are searching for activation functions and auto augment in Auto augment the meta learning algorithm learns a set of data augmentation policies like shearing and rotating images in order to get more adolescent insertion for activation functions a series of functions are embedded in a discrete search space and a search algorithm designs novel activation functions for them so the quick overview of neural architecture search is that they are going to design two convolutional layers a normal cell and a reduction cell the reduction cell use to reduce the spatial resolution of layers and the algorithm is going to choose from these operations using this recurrent Network procedure and then this is an example of a discovered layer from the neural architecture search algorithm so now into the presentation the key idea is the neural architecture setnet cell so the idea the algorithm is to design a single convolutional layer rather than the entire network and then the overall architecture of the network is manually predetermined and it's going to consist of repeatedly stacking the found normal and reduction layers on top of each other so again the normal layer returns a feature map of the same dimension so in convolutional layers if you take an input image of 32 by 32 height width and you slide a 3x3 colonel over it to convolve it and produce new features you're gonna now have 30 by 30 hi by with due to just the sliding of a three by three window on a 32 by 32 grid so normal layers are gonna return the same spatial resolution and reduction layers are going to reduce the height and width by a factor of two and then in some designs the normal cells repeated n times before a reduction cell and this n is a hyper parameter of the metal learning algorithm and one other detail is that when they use the image net data set they're going to need more reduction cells because they need to reduce the amount of pixels in each feature map to save computation so this is the high-level idea all the convolutional nets in the search space are composed of these design layers from the search algorithm the normal cell and the reduction cell so they have the identical structure that is repeated several times and then they're trained as a normal compositional network with each layer having different weights during training so what they're gonna do is they're going to search on a smaller data set and then transfer the learned layer into the imagenet dataset so C 410 these images shown on this slide are act that actually how small C far ten images are there are only 32 by 32 RGB images which is really small and in addition to this the C 410 contains 50,000 training sets compared to image net where the images are much larger typically processed at about 300 by 300 resolution and there's 1.2 million samples so here we're going to get more into the details of how they design the layer how they use reinforcement learning how they use a recurrent neural network to predict delay so the high-level idea of recurrent neural networks and like things like LS TMS is that they process sequences so that rather than having fixed data like an image you know an image matrix where you just it's the same thing throughout they break data up into sequences like with language models it's one word is fed at a time and the way that this works is that the network has hidden states and so it has like its own hidden memory in addition to the new input at each time set and then LCM czar more advanced with their own Mike forget gate and you know auxilary parameter terms like this so what its gonna do is it's gonna select a hidden state then conditioned on the Select because it the way it predicts it is it will condition itself on its previous predictions with its internal state so as it processes its own sequence of predictions it's going to condition further predictions on what it is already predicted so it's going to make these five steps in it's recurrent prediction it's going to select a hidden state then it's gonna select another hidden state in the layer and these hidden states are like feature maps and then they're going to select an operation to apply it to each of the hidden states and they're going to define a way to concatenate the outputs of the operations from the hidden States it's chosen so these are the operations that it can choose from it can choose to either take the feature map and do a one by one convolution on it a three by three convolution max pooling it can choose any of these discrete operations to do to the hidden states selected and then after it does that it can either have an element-wise addition between the two states or it can just concatenate them along the filter dimension so this again it's predicting normal and reduction cells so it's going to make two times the five be predictions in total with B just being the number of like connections designed internally in the layer so the illustration is like this it'll select a hidden layer a and hidden layer B then it will choose two operations for each hidden layer from this discreet search space and then they'll choose a way of aggregating the new feature box and that will result in the design of layers such as this so both these seeing the picture helps to understand why it's B equals five in addition to the five pretty it makes which can be confusing but is referring to like how many of these kind of like internal cells it constructs but anyway so you can see that it selects a lot of separable convolutions that's one of the key finds in the paper is that the neural architecture such that search and net really likes these separable convolutional layers so it's kind of similar to like the inception Network in the network and network design how they split up the feature maps to go all these different ways but this is a really interesting complex design that it comes up with so again what about random search rather than going through the trouble of proximal policy optimization and using the recurrent neural network controller to design these layers this plot shows the comparison of using the reinforcement learning search technique with their current neural network compared to just randomly searching through the different operations and the different hidden states to concatenate so in this result it shows that reinforcement learning gets over a 1% improvement than random search so in addition to the 1% improvement on the top model reinforcement learning also finds an entire range of models so if you compare the top 5 and top 25 models found between two methods reinforcement learning will heavily outperform random search another technique that they use for the optimization of neural architecture search is scheduled drop path and this idea is to drop some of the paths that send the future maps to different layers with some probability similar to drop out or you just X out neurons in like a multi-layer perceptron the idea of schedule drop paths is that they act as training progresses they will increase the frequency at which they drop the paths so one of the key takeaways from the paper is that it takes them for days on 500 GPUs to train this method and this is still seven times faster than the previous approaches the previous approach to this took 800 GPUs for 28 days and you know accounting for 22,000 GPU hours but then again the GPUs that they use in this paper are significantly better than the old GPUs so they do estimate this technique is about seven times faster than previous neural architecture search algorithms so with the results they are able to achieve a 1.2 percent improvement in top 1 accuracy on image net with 9 billion fewer floating-point operations per second and this is huge because this is totally automated this isn't any human design features and it achieves 2.4 percent error rate on CFR 10 as well so this table shows the plot of neural architecture search compared to methods such as dense net and shakeshake regularization these are the results on the image net data set this plot shows how neural architecture search is able to achieve higher performance with less computation than previous human engineered neural neural network designs so one interesting thing is skipped connection skip connections are found to work really well in networks such as ResNet shown on the left and dense now on the right but they are just doing it based on repeatedly concatenated Easler without any skipped connections so they also tested with just adding the skipped connections after training manually and they found that this didn't improve performance one other interesting application the neural architectural search is to use this as features for object detection so what they view is they compare the they combine the region proposal Network from faster are CNN with the neural architecture search image features thanks for watching this video on neural architecture search the paper link is provided in the description please subscribe to this channel for more videos on deep learning [Music]
Original Description
Meta-Learning is one of the most interesting methods powering next-generation Deep Neural Networks. This video will explain the idea of using Search algorithms to design Neural Network layers!
Thanks for watching, please Subscribe for more Deep Learning videos!
Paper Link:
https://arxiv.org/abs/1707.07012
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Connor Shorten · Connor Shorten · 20 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
▶
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
DenseNets
Connor Shorten
DeepWalk Explained
Connor Shorten
Inception Network Explained
Connor Shorten
StackGAN
Connor Shorten
StyleGAN
Connor Shorten
Progressive Growing of GANs Explained
Connor Shorten
Improved Techniques for Training GANs
Connor Shorten
Word2Vec Explained
Connor Shorten
Must Read Papers on GANs
Connor Shorten
Unsupervised Feature Learning
Connor Shorten
Self-Supervised GANs
Connor Shorten
Embedding Graphs with Deep Learning
Connor Shorten
Transfer Learning in GANs
Connor Shorten
ReLU Activation Function
Connor Shorten
AC-GAN Explained
Connor Shorten
SimGAN Explained
Connor Shorten
DC-GAN Explained!
Connor Shorten
ResNet Explained!
Connor Shorten
Graph Convolutional Networks
Connor Shorten
Neural Architecture Search
Connor Shorten
Henry AI Labs
Connor Shorten
Video Classification with Deep Learning
Connor Shorten
BigGANs in Data Augmentation
Connor Shorten
Introduction to Deep Learning
Connor Shorten
EfficientNet Explained!
Connor Shorten
Self-Attention GAN
Connor Shorten
Curriculum Learning in Deep Neural Networks
Connor Shorten
Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging
Connor Shorten
Deep Compression
Connor Shorten
Skin Cancer Classification with Deep Learning
Connor Shorten
Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging
Connor Shorten
The Lottery Ticket Hypothesis Explained!
Connor Shorten
SqueezeNet
Connor Shorten
GauGAN Explained!
Connor Shorten
AutoML with Hyperband
Connor Shorten
DL Podcast #3 | Yannic Kilcher | Population-Based Search
Connor Shorten
Weakly Supervised Pretraining
Connor Shorten
Image Data Augmentation for Deep Learning
Connor Shorten
Unsupervised Data Augmentation
Connor Shorten
Wide ResNet Explained!
Connor Shorten
RevNet: Backpropagation without Storing Activations
Connor Shorten
GANs with Fewer Labels
Connor Shorten
BigBiGAN Unsupervised Learning!
Connor Shorten
Self-Supervised Learning
Connor Shorten
Multi-Task Self-Supervised Learning
Connor Shorten
Self-Supervised GANs
Connor Shorten
Population Based Training
Connor Shorten
Show, Attend and Tell
Connor Shorten
Siamese Neural Networks
Connor Shorten
WaveGAN Explained!
Connor Shorten
VAE-GAN Explained!
Connor Shorten
Evolution in Neural Architecture Search!
Connor Shorten
AI Research Weekly Update August 18th, 2019
Connor Shorten
Weight Agnostic Neural Networks Explained!
Connor Shorten
AI Research Weekly Update August 25th, 2019
Connor Shorten
Neuroevolution of Augmenting Topologies (NEAT)
Connor Shorten
CoDeepNEAT
Connor Shorten
AI Research Weekly Update September 1st, 2019
Connor Shorten
Randomly Wired Neural Networks
Connor Shorten
Genetic CNN
Connor Shorten
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Medium · Machine Learning
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Medium · Data Science
Data Preprocessing: Encoding and Feature Scaling in Machine Learning
Medium · Python
The Python Dictionary Trick That Makes Interviewers Smile
Dev.to · Ameer Abdullah
🎓
Tutor Explanation
DeepCamp AI