Neural Architecture Search

Connor Shorten · Advanced ·📐 ML Fundamentals ·7y ago

Skills: ML Maths Basics80%Supervised Learning60%Unsupervised Learning60%

Key Takeaways

This video discusses Neural Architecture Search (NAS) and its application in designing neural network layers using meta-learning and reinforcement learning, with tools such as Recurrent Neural Networks (RNNs), Scheduled Drop Path, and Proximal Policy Optimization.

Full Transcript

[Music] this video will explain neural architecture search neural architecture search belongs to a family of deep learning methods known as meta learning meta learning is the idea of using using an auxilary search algorithm such as random search manual search grid search evolutionary search or reinforcement learning in order to design the characteristics of a neural network these characteristic of the neural network can be on the surface level with things like learning rate betta terms of optimizers and then things like the number of filter maps activate and high-level decisions in neural architecture search the meta learning characteristics of the neural network are taking a step inside the network so some other examples of this are searching for activation functions and auto augment in Auto augment the meta learning algorithm learns a set of data augmentation policies like shearing and rotating images in order to get more adolescent insertion for activation functions a series of functions are embedded in a discrete search space and a search algorithm designs novel activation functions for them so the quick overview of neural architecture search is that they are going to design two convolutional layers a normal cell and a reduction cell the reduction cell use to reduce the spatial resolution of layers and the algorithm is going to choose from these operations using this recurrent Network procedure and then this is an example of a discovered layer from the neural architecture search algorithm so now into the presentation the key idea is the neural architecture setnet cell so the idea the algorithm is to design a single convolutional layer rather than the entire network and then the overall architecture of the network is manually predetermined and it's going to consist of repeatedly stacking the found normal and reduction layers on top of each other so again the normal layer returns a feature map of the same dimension so in convolutional layers if you take an input image of 32 by 32 height width and you slide a 3x3 colonel over it to convolve it and produce new features you're gonna now have 30 by 30 hi by with due to just the sliding of a three by three window on a 32 by 32 grid so normal layers are gonna return the same spatial resolution and reduction layers are going to reduce the height and width by a factor of two and then in some designs the normal cells repeated n times before a reduction cell and this n is a hyper parameter of the metal learning algorithm and one other detail is that when they use the image net data set they're going to need more reduction cells because they need to reduce the amount of pixels in each feature map to save computation so this is the high-level idea all the convolutional nets in the search space are composed of these design layers from the search algorithm the normal cell and the reduction cell so they have the identical structure that is repeated several times and then they're trained as a normal compositional network with each layer having different weights during training so what they're gonna do is they're going to search on a smaller data set and then transfer the learned layer into the imagenet dataset so C 410 these images shown on this slide are act that actually how small C far ten images are there are only 32 by 32 RGB images which is really small and in addition to this the C 410 contains 50,000 training sets compared to image net where the images are much larger typically processed at about 300 by 300 resolution and there's 1.2 million samples so here we're going to get more into the details of how they design the layer how they use reinforcement learning how they use a recurrent neural network to predict delay so the high-level idea of recurrent neural networks and like things like LS TMS is that they process sequences so that rather than having fixed data like an image you know an image matrix where you just it's the same thing throughout they break data up into sequences like with language models it's one word is fed at a time and the way that this works is that the network has hidden states and so it has like its own hidden memory in addition to the new input at each time set and then LCM czar more advanced with their own Mike forget gate and you know auxilary parameter terms like this so what its gonna do is it's gonna select a hidden state then conditioned on the Select because it the way it predicts it is it will condition itself on its previous predictions with its internal state so as it processes its own sequence of predictions it's going to condition further predictions on what it is already predicted so it's going to make these five steps in it's recurrent prediction it's going to select a hidden state then it's gonna select another hidden state in the layer and these hidden states are like feature maps and then they're going to select an operation to apply it to each of the hidden states and they're going to define a way to concatenate the outputs of the operations from the hidden States it's chosen so these are the operations that it can choose from it can choose to either take the feature map and do a one by one convolution on it a three by three convolution max pooling it can choose any of these discrete operations to do to the hidden states selected and then after it does that it can either have an element-wise addition between the two states or it can just concatenate them along the filter dimension so this again it's predicting normal and reduction cells so it's going to make two times the five be predictions in total with B just being the number of like connections designed internally in the layer so the illustration is like this it'll select a hidden layer a and hidden layer B then it will choose two operations for each hidden layer from this discreet search space and then they'll choose a way of aggregating the new feature box and that will result in the design of layers such as this so both these seeing the picture helps to understand why it's B equals five in addition to the five pretty it makes which can be confusing but is referring to like how many of these kind of like internal cells it constructs but anyway so you can see that it selects a lot of separable convolutions that's one of the key finds in the paper is that the neural architecture such that search and net really likes these separable convolutional layers so it's kind of similar to like the inception Network in the network and network design how they split up the feature maps to go all these different ways but this is a really interesting complex design that it comes up with so again what about random search rather than going through the trouble of proximal policy optimization and using the recurrent neural network controller to design these layers this plot shows the comparison of using the reinforcement learning search technique with their current neural network compared to just randomly searching through the different operations and the different hidden states to concatenate so in this result it shows that reinforcement learning gets over a 1% improvement than random search so in addition to the 1% improvement on the top model reinforcement learning also finds an entire range of models so if you compare the top 5 and top 25 models found between two methods reinforcement learning will heavily outperform random search another technique that they use for the optimization of neural architecture search is scheduled drop path and this idea is to drop some of the paths that send the future maps to different layers with some probability similar to drop out or you just X out neurons in like a multi-layer perceptron the idea of schedule drop paths is that they act as training progresses they will increase the frequency at which they drop the paths so one of the key takeaways from the paper is that it takes them for days on 500 GPUs to train this method and this is still seven times faster than the previous approaches the previous approach to this took 800 GPUs for 28 days and you know accounting for 22,000 GPU hours but then again the GPUs that they use in this paper are significantly better than the old GPUs so they do estimate this technique is about seven times faster than previous neural architecture search algorithms so with the results they are able to achieve a 1.2 percent improvement in top 1 accuracy on image net with 9 billion fewer floating-point operations per second and this is huge because this is totally automated this isn't any human design features and it achieves 2.4 percent error rate on CFR 10 as well so this table shows the plot of neural architecture search compared to methods such as dense net and shakeshake regularization these are the results on the image net data set this plot shows how neural architecture search is able to achieve higher performance with less computation than previous human engineered neural neural network designs so one interesting thing is skipped connection skip connections are found to work really well in networks such as ResNet shown on the left and dense now on the right but they are just doing it based on repeatedly concatenated Easler without any skipped connections so they also tested with just adding the skipped connections after training manually and they found that this didn't improve performance one other interesting application the neural architectural search is to use this as features for object detection so what they view is they compare the they combine the region proposal Network from faster are CNN with the neural architecture search image features thanks for watching this video on neural architecture search the paper link is provided in the description please subscribe to this channel for more videos on deep learning [Music]

Original Description

Meta-Learning is one of the most interesting methods powering next-generation Deep Neural Networks. This video will explain the idea of using Search algorithms to design Neural Network layers! Thanks for watching, please Subscribe for more Deep Learning videos! Paper Link: https://arxiv.org/abs/1707.07012

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Connor Shorten · Connor Shorten · 20 of 60

← Previous Next →

DeepWalk Explained

DeepWalk Explained

Inception Network Explained

Inception Network Explained

Progressive Growing of GANs Explained

Progressive Growing of GANs Explained

Improved Techniques for Training GANs

Improved Techniques for Training GANs

Word2Vec Explained

Word2Vec Explained

Must Read Papers on GANs

Must Read Papers on GANs

Unsupervised Feature Learning

Unsupervised Feature Learning

Self-Supervised GANs

Self-Supervised GANs

Embedding Graphs with Deep Learning

Embedding Graphs with Deep Learning

Transfer Learning in GANs

Transfer Learning in GANs

ReLU Activation Function

ReLU Activation Function

AC-GAN Explained

AC-GAN Explained

SimGAN Explained

SimGAN Explained

DC-GAN Explained!

DC-GAN Explained!

ResNet Explained!

ResNet Explained!

Graph Convolutional Networks

Graph Convolutional Networks

Neural Architecture Search

Neural Architecture Search

Video Classification with Deep Learning

Video Classification with Deep Learning

BigGANs in Data Augmentation

BigGANs in Data Augmentation

Introduction to Deep Learning

Introduction to Deep Learning

EfficientNet Explained!

EfficientNet Explained!

Self-Attention GAN

Self-Attention GAN

Curriculum Learning in Deep Neural Networks

Curriculum Learning in Deep Neural Networks

Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging

Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging

Deep Compression

Deep Compression

Skin Cancer Classification with Deep Learning

Skin Cancer Classification with Deep Learning

Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging

Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging

The Lottery Ticket Hypothesis Explained!

The Lottery Ticket Hypothesis Explained!

GauGAN Explained!

GauGAN Explained!

AutoML with Hyperband

AutoML with Hyperband

DL Podcast #3 | Yannic Kilcher | Population-Based Search

DL Podcast #3 | Yannic Kilcher | Population-Based Search

Weakly Supervised Pretraining

Weakly Supervised Pretraining

Image Data Augmentation for Deep Learning

Image Data Augmentation for Deep Learning

Unsupervised Data Augmentation

Unsupervised Data Augmentation

Wide ResNet Explained!

Wide ResNet Explained!

RevNet: Backpropagation without Storing Activations

RevNet: Backpropagation without Storing Activations

GANs with Fewer Labels

GANs with Fewer Labels

BigBiGAN Unsupervised Learning!

BigBiGAN Unsupervised Learning!

Self-Supervised Learning

Self-Supervised Learning

Multi-Task Self-Supervised Learning

Multi-Task Self-Supervised Learning

Self-Supervised GANs

Self-Supervised GANs

Population Based Training

Population Based Training

Show, Attend and Tell

Show, Attend and Tell

Siamese Neural Networks

Siamese Neural Networks

WaveGAN Explained!

WaveGAN Explained!

VAE-GAN Explained!

VAE-GAN Explained!

Evolution in Neural Architecture Search!

Evolution in Neural Architecture Search!

AI Research Weekly Update August 18th, 2019

AI Research Weekly Update August 18th, 2019

Weight Agnostic Neural Networks Explained!

Weight Agnostic Neural Networks Explained!

AI Research Weekly Update August 25th, 2019

AI Research Weekly Update August 25th, 2019

Neuroevolution of Augmenting Topologies (NEAT)

Neuroevolution of Augmenting Topologies (NEAT)

AI Research Weekly Update September 1st, 2019

AI Research Weekly Update September 1st, 2019

Randomly Wired Neural Networks

Randomly Wired Neural Networks

This video teaches how to apply Neural Architecture Search (NAS) to design neural network layers using meta-learning and reinforcement learning, with a focus on optimization techniques such as Scheduled Drop Path. The video discusses the application of NAS to various tasks, including image classification and object detection.

Key Takeaways

Apply meta-learning to NAS
Use reinforcement learning for optimization
Design neural network layers using RNNs
Evaluate model performance using ImageNet
Apply NAS to object detection tasks

💡 The use of reinforcement learning and meta-learning in NAS can lead to significant improvements in model performance and efficiency, such as a 1.2% improvement in top 1 accuracy on ImageNet with 9 billion fewer floating-point operations per second.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Maths Basics

View skill →

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Data Science Fundamentals: Data Cleaning in Python

Data Science Fundamentals: Data Cleaning in Python

Related Reads

Chunking Done Right: Normalization, sentence boundaries, and overlap

Master chunking techniques to improve retrieval pipeline performance and avoid common pitfalls

Medium · Programming

Why Materials Scientists Are Still Copy-Pasting Data from PDFs in 2026 (And Why AI Changes…

Materials scientists still copy-paste data from PDFs, but AI can change this tedious task

Medium · Machine Learning

From Python Slop to 4µs Rust: How We Accelerated Market Microstructure Simulations by 25,000x

Accelerate market microstructure simulations by 25,000x by migrating from Python to Rust, learning how to optimize performance-critical code

Medium · Data Science

Crafting the Optimal Path: A Deep-Dive Evaluation of Informed vs.

Learn to evaluate and optimize grid-based pathfinding algorithms for informed and uninformed searches in Python

Medium · Python

Is Python Dead in 2026?| Truth About Python in AI Era | 90 Days Roadmap @FameWorldEducationalHub

FAME WORLD EDUCATIONAL HUB