Neural Architecture Search

Connor Shorten · Advanced ·📐 ML Fundamentals ·7y ago

Key Takeaways

This video discusses Neural Architecture Search (NAS) and its application in designing neural network layers using meta-learning and reinforcement learning, with tools such as Recurrent Neural Networks (RNNs), Scheduled Drop Path, and Proximal Policy Optimization.

Full Transcript

[Music] this video will explain neural architecture search neural architecture search belongs to a family of deep learning methods known as meta learning meta learning is the idea of using using an auxilary search algorithm such as random search manual search grid search evolutionary search or reinforcement learning in order to design the characteristics of a neural network these characteristic of the neural network can be on the surface level with things like learning rate betta terms of optimizers and then things like the number of filter maps activate and high-level decisions in neural architecture search the meta learning characteristics of the neural network are taking a step inside the network so some other examples of this are searching for activation functions and auto augment in Auto augment the meta learning algorithm learns a set of data augmentation policies like shearing and rotating images in order to get more adolescent insertion for activation functions a series of functions are embedded in a discrete search space and a search algorithm designs novel activation functions for them so the quick overview of neural architecture search is that they are going to design two convolutional layers a normal cell and a reduction cell the reduction cell use to reduce the spatial resolution of layers and the algorithm is going to choose from these operations using this recurrent Network procedure and then this is an example of a discovered layer from the neural architecture search algorithm so now into the presentation the key idea is the neural architecture setnet cell so the idea the algorithm is to design a single convolutional layer rather than the entire network and then the overall architecture of the network is manually predetermined and it's going to consist of repeatedly stacking the found normal and reduction layers on top of each other so again the normal layer returns a feature map of the same dimension so in convolutional layers if you take an input image of 32 by 32 height width and you slide a 3x3 colonel over it to convolve it and produce new features you're gonna now have 30 by 30 hi by with due to just the sliding of a three by three window on a 32 by 32 grid so normal layers are gonna return the same spatial resolution and reduction layers are going to reduce the height and width by a factor of two and then in some designs the normal cells repeated n times before a reduction cell and this n is a hyper parameter of the metal learning algorithm and one other detail is that when they use the image net data set they're going to need more reduction cells because they need to reduce the amount of pixels in each feature map to save computation so this is the high-level idea all the convolutional nets in the search space are composed of these design layers from the search algorithm the normal cell and the reduction cell so they have the identical structure that is repeated several times and then they're trained as a normal compositional network with each layer having different weights during training so what they're gonna do is they're going to search on a smaller data set and then transfer the learned layer into the imagenet dataset so C 410 these images shown on this slide are act that actually how small C far ten images are there are only 32 by 32 RGB images which is really small and in addition to this the C 410 contains 50,000 training sets compared to image net where the images are much larger typically processed at about 300 by 300 resolution and there's 1.2 million samples so here we're going to get more into the details of how they design the layer how they use reinforcement learning how they use a recurrent neural network to predict delay so the high-level idea of recurrent neural networks and like things like LS TMS is that they process sequences so that rather than having fixed data like an image you know an image matrix where you just it's the same thing throughout they break data up into sequences like with language models it's one word is fed at a time and the way that this works is that the network has hidden states and so it has like its own hidden memory in addition to the new input at each time set and then LCM czar more advanced with their own Mike forget gate and you know auxilary parameter terms like this so what its gonna do is it's gonna select a hidden state then conditioned on the Select because it the way it predicts it is it will condition itself on its previous predictions with its internal state so as it processes its own sequence of predictions it's going to condition further predictions on what it is already predicted so it's going to make these five steps in it's recurrent prediction it's going to select a hidden state then it's gonna select another hidden state in the layer and these hidden states are like feature maps and then they're going to select an operation to apply it to each of the hidden states and they're going to define a way to concatenate the outputs of the operations from the hidden States it's chosen so these are the operations that it can choose from it can choose to either take the feature map and do a one by one convolution on it a three by three convolution max pooling it can choose any of these discrete operations to do to the hidden states selected and then after it does that it can either have an element-wise addition between the two states or it can just concatenate them along the filter dimension so this again it's predicting normal and reduction cells so it's going to make two times the five be predictions in total with B just being the number of like connections designed internally in the layer so the illustration is like this it'll select a hidden layer a and hidden layer B then it will choose two operations for each hidden layer from this discreet search space and then they'll choose a way of aggregating the new feature box and that will result in the design of layers such as this so both these seeing the picture helps to understand why it's B equals five in addition to the five pretty it makes which can be confusing but is referring to like how many of these kind of like internal cells it constructs but anyway so you can see that it selects a lot of separable convolutions that's one of the key finds in the paper is that the neural architecture such that search and net really likes these separable convolutional layers so it's kind of similar to like the inception Network in the network and network design how they split up the feature maps to go all these different ways but this is a really interesting complex design that it comes up with so again what about random search rather than going through the trouble of proximal policy optimization and using the recurrent neural network controller to design these layers this plot shows the comparison of using the reinforcement learning search technique with their current neural network compared to just randomly searching through the different operations and the different hidden states to concatenate so in this result it shows that reinforcement learning gets over a 1% improvement than random search so in addition to the 1% improvement on the top model reinforcement learning also finds an entire range of models so if you compare the top 5 and top 25 models found between two methods reinforcement learning will heavily outperform random search another technique that they use for the optimization of neural architecture search is scheduled drop path and this idea is to drop some of the paths that send the future maps to different layers with some probability similar to drop out or you just X out neurons in like a multi-layer perceptron the idea of schedule drop paths is that they act as training progresses they will increase the frequency at which they drop the paths so one of the key takeaways from the paper is that it takes them for days on 500 GPUs to train this method and this is still seven times faster than the previous approaches the previous approach to this took 800 GPUs for 28 days and you know accounting for 22,000 GPU hours but then again the GPUs that they use in this paper are significantly better than the old GPUs so they do estimate this technique is about seven times faster than previous neural architecture search algorithms so with the results they are able to achieve a 1.2 percent improvement in top 1 accuracy on image net with 9 billion fewer floating-point operations per second and this is huge because this is totally automated this isn't any human design features and it achieves 2.4 percent error rate on CFR 10 as well so this table shows the plot of neural architecture search compared to methods such as dense net and shakeshake regularization these are the results on the image net data set this plot shows how neural architecture search is able to achieve higher performance with less computation than previous human engineered neural neural network designs so one interesting thing is skipped connection skip connections are found to work really well in networks such as ResNet shown on the left and dense now on the right but they are just doing it based on repeatedly concatenated Easler without any skipped connections so they also tested with just adding the skipped connections after training manually and they found that this didn't improve performance one other interesting application the neural architectural search is to use this as features for object detection so what they view is they compare the they combine the region proposal Network from faster are CNN with the neural architecture search image features thanks for watching this video on neural architecture search the paper link is provided in the description please subscribe to this channel for more videos on deep learning [Music]

Original Description

Meta-Learning is one of the most interesting methods powering next-generation Deep Neural Networks. This video will explain the idea of using Search algorithms to design Neural Network layers! Thanks for watching, please Subscribe for more Deep Learning videos! Paper Link: https://arxiv.org/abs/1707.07012
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Connor Shorten · Connor Shorten · 20 of 60

1 DenseNets
DenseNets
Connor Shorten
2 DeepWalk Explained
DeepWalk Explained
Connor Shorten
3 Inception Network Explained
Inception Network Explained
Connor Shorten
4 StackGAN
StackGAN
Connor Shorten
5 StyleGAN
StyleGAN
Connor Shorten
6 Progressive Growing of GANs Explained
Progressive Growing of GANs Explained
Connor Shorten
7 Improved Techniques for Training GANs
Improved Techniques for Training GANs
Connor Shorten
8 Word2Vec Explained
Word2Vec Explained
Connor Shorten
9 Must Read Papers on GANs
Must Read Papers on GANs
Connor Shorten
10 Unsupervised Feature Learning
Unsupervised Feature Learning
Connor Shorten
11 Self-Supervised GANs
Self-Supervised GANs
Connor Shorten
12 Embedding Graphs with Deep Learning
Embedding Graphs with Deep Learning
Connor Shorten
13 Transfer Learning in GANs
Transfer Learning in GANs
Connor Shorten
14 ReLU Activation Function
ReLU Activation Function
Connor Shorten
15 AC-GAN Explained
AC-GAN Explained
Connor Shorten
16 SimGAN Explained
SimGAN Explained
Connor Shorten
17 DC-GAN Explained!
DC-GAN Explained!
Connor Shorten
18 ResNet Explained!
ResNet Explained!
Connor Shorten
19 Graph Convolutional Networks
Graph Convolutional Networks
Connor Shorten
Neural Architecture Search
Neural Architecture Search
Connor Shorten
21 Henry AI Labs
Henry AI Labs
Connor Shorten
22 Video Classification with Deep Learning
Video Classification with Deep Learning
Connor Shorten
23 BigGANs in Data Augmentation
BigGANs in Data Augmentation
Connor Shorten
24 Introduction to Deep Learning
Introduction to Deep Learning
Connor Shorten
25 EfficientNet Explained!
EfficientNet Explained!
Connor Shorten
26 Self-Attention GAN
Self-Attention GAN
Connor Shorten
27 Curriculum Learning in Deep Neural Networks
Curriculum Learning in Deep Neural Networks
Connor Shorten
28 Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging
Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging
Connor Shorten
29 Deep Compression
Deep Compression
Connor Shorten
30 Skin Cancer Classification with Deep Learning
Skin Cancer Classification with Deep Learning
Connor Shorten
31 Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging
Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging
Connor Shorten
32 The Lottery Ticket Hypothesis Explained!
The Lottery Ticket Hypothesis Explained!
Connor Shorten
33 SqueezeNet
SqueezeNet
Connor Shorten
34 GauGAN Explained!
GauGAN Explained!
Connor Shorten
35 AutoML with Hyperband
AutoML with Hyperband
Connor Shorten
36 DL Podcast #3 | Yannic Kilcher | Population-Based Search
DL Podcast #3 | Yannic Kilcher | Population-Based Search
Connor Shorten
37 Weakly Supervised Pretraining
Weakly Supervised Pretraining
Connor Shorten
38 Image Data Augmentation for Deep Learning
Image Data Augmentation for Deep Learning
Connor Shorten
39 Unsupervised Data Augmentation
Unsupervised Data Augmentation
Connor Shorten
40 Wide ResNet Explained!
Wide ResNet Explained!
Connor Shorten
41 RevNet: Backpropagation without Storing Activations
RevNet: Backpropagation without Storing Activations
Connor Shorten
42 GANs with Fewer Labels
GANs with Fewer Labels
Connor Shorten
43 BigBiGAN Unsupervised Learning!
BigBiGAN Unsupervised Learning!
Connor Shorten
44 Self-Supervised Learning
Self-Supervised Learning
Connor Shorten
45 Multi-Task Self-Supervised Learning
Multi-Task Self-Supervised Learning
Connor Shorten
46 Self-Supervised GANs
Self-Supervised GANs
Connor Shorten
47 Population Based Training
Population Based Training
Connor Shorten
48 Show, Attend and Tell
Show, Attend and Tell
Connor Shorten
49 Siamese Neural Networks
Siamese Neural Networks
Connor Shorten
50 WaveGAN Explained!
WaveGAN Explained!
Connor Shorten
51 VAE-GAN Explained!
VAE-GAN Explained!
Connor Shorten
52 Evolution in Neural Architecture Search!
Evolution in Neural Architecture Search!
Connor Shorten
53 AI Research Weekly Update August 18th, 2019
AI Research Weekly Update August 18th, 2019
Connor Shorten
54 Weight Agnostic Neural Networks Explained!
Weight Agnostic Neural Networks Explained!
Connor Shorten
55 AI Research Weekly Update August 25th, 2019
AI Research Weekly Update August 25th, 2019
Connor Shorten
56 Neuroevolution of Augmenting Topologies (NEAT)
Neuroevolution of Augmenting Topologies (NEAT)
Connor Shorten
57 CoDeepNEAT
CoDeepNEAT
Connor Shorten
58 AI Research Weekly Update September 1st, 2019
AI Research Weekly Update September 1st, 2019
Connor Shorten
59 Randomly Wired Neural Networks
Randomly Wired Neural Networks
Connor Shorten
60 Genetic CNN
Genetic CNN
Connor Shorten

This video teaches how to apply Neural Architecture Search (NAS) to design neural network layers using meta-learning and reinforcement learning, with a focus on optimization techniques such as Scheduled Drop Path. The video discusses the application of NAS to various tasks, including image classification and object detection.

Key Takeaways
  1. Apply meta-learning to NAS
  2. Use reinforcement learning for optimization
  3. Design neural network layers using RNNs
  4. Evaluate model performance using ImageNet
  5. Apply NAS to object detection tasks
💡 The use of reinforcement learning and meta-learning in NAS can lead to significant improvements in model performance and efficiency, such as a 1.2% improvement in top 1 accuracy on ImageNet with 9 billion fewer floating-point operations per second.

Related AI Lessons

Up next
Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub
FAME WORLD EDUCATIONAL HUB
Watch →