AutoML-Zero
Skills:
LLM Foundations80%ML Maths Basics70%Unsupervised Learning60%Supervised Learning60%Research Methods50%
Key Takeaways
The video explores AutoML-Zero, an evolutionary search algorithm for discovering machine learning programs, and demonstrates its ability to learn and adapt to various tasks and problem spaces. AutoML-Zero uses a sparse and generic search space, evolutionary search strategies, and regularization techniques to discover new algorithms and improve existing ones.
Full Transcript
auto ml0 is a new algorithm for searching for machine learning algorithms Auto mill zero stars from set up predict and learn functions with access to memory to store scalar vector or matrix variables Auto ml zero learns to select from an enormous set of basic operations like addition dot products or normalization in order to construct machine learning algorithms from the set up predict and learned component functions that each manipulate values in memory to classify vector input features according to their scalar labels because the search space is very sparse and most of the combinations of random operations on these memory addresses result in complete nonsense functions it's important to have a clever search algorithm rather than just using random search the others use an evolutionary search algorithm equipped with migration data set diversity and hurdling to expedite the search and evaluate 10000 models per second per CPU through their experiments the authors show that this search can discover neural networks trained with gradient descent dropout operations and rate of activations from this enormous search space starting from empty component functions this video will explain the details behind Auto ml 0 this video will explain Auto ml 0 a new paper from researchers at Google that uses evolutionary search to discover machine learning programs by selecting basic mathematical operations that fit into a human design framework of setup predict and learn functions Auto ml 0 is one of the most expressive and generic search spaces ever explored in the space of auto ml neural architecture search or meta learning research the idea behind machine learning from scratch in Auto ml 0 is to start off with these very generic programs that consist of set up predict and learn functions it also has access to these memory addresses of scalar vector and matrix variables that it can use to take in the input features v-0 and then run them through a forward pass and then learn how to update the features or the different values in the memory variables in order to perform classification tasks such as CFR 10 binary classification where you take as input a flattened image vector and then you learn how to multiply it with the weights different operations like adding noise to this multiplication and then you learn the learning function to update the weights that are contained in these memory addresses so that in this way the machine learning program is learned from scratch as it learns to symbolically through genetic programming to change the different operations that are contained in the setup predict and learn component functions in order to develop these machine learning algorithms this image shows how linear regression is represented in the search space of Otto Amell zero so Otto Mills zero has the core framework of setup predict and learn functions in setup you initialize the values of certain variables in the memory address so in this case you initialize the value and s2 of the scalar variables with 0.001 as later uses a learning rate in the learn function then when you predict the program learns to do the s1 equals the dot product between the weights stored in the vector address v1 and the input features v-0 in the learn function it learns how to take the signal of the predicted label and then the real label at 0 and compute the error apply the learning rate compute the gradient and then update the weights so the really interesting thing about Auto ml 0 is that this starts off just random operations and every component function of setup predict and learn so the fact that this program is able to learn something like initialize learning rate dot products or the gradients is really amazing because it has to search this really enormous search space that is really sparse and most of the programs that you would discover in this kind of space are completely useless this image from their github repository describing the paper further illustrates how the corresponding program corresponds with different operations used in the forward pass and the backward pass to update the waste so in this way the program is learning how to set up the variables how to run a forward pass and make a prediction of the class label given input features and then learn how to update the different variables that it's using in order to make better predictions in the future the program is designed in Auto ml0 have this search space of different operations that they can apply to each of the variables in order to understand and develop these machine learning algorithms they can do things like basic multiplications doing the absolute values of scalar values applying sine cosine functions Exponential's logarithms all these different operations that they can apply to the different variables in order to compose these functions that update weights and make predictions on the feature with the input features so it's really amazing that given this sparse space where most of these operations don't really make much sense in the context of machine learning these machine learning programs can still be designed through the use of evolutionary search these machine learning programs are discovered in the search space through the use of evolutionary search evolutionary search describes the framework of having a population of programs and in selecting sub permutation based on evaluating them with a fitness function and then mutating them based on some operations and then putting those mutated programs back into the population particularly in this case they use regularize devolution which is where you penalize members of the population that have been in the population and survive several rounds of mutation and selection the key workhorse behind these evolutionary algorithms is the way in which they can mutate the candidate programs to form new ones in this paper Otto ml0 the ways that they can mutate the programs are about either inserting a new operation so just taking this existing learned component function and adding in s2 equals sign v1 into the existing component function of the previous parent that had survived the previous evaluation of the Fitness function and then form a new program by doing this mutation the next type of mutation they can make is to randomize given component function so in this case they take apart the predict function and the parent and randomly assign this new value to do the predict function this new set of functions in order to form the new predict function in the child program the third algorithm that they can use to mutate the parent program is to take a given variable and randomly change it to form a new function this is the way in which these programs evolve and mutate to form new programs in the evolutionary search process the auto ml 0 search space for machine learning algorithms is very sparse and generic meaning that most of these programs are completely useless for example in the image describing the different mutations that you can make to form new programs this one predict function takes in the scalar s0 assigns it to the mean of some matrix that hasn't been defined at all in the setup and isn't learned at all in the learn function then it assigns this other random scalar value s3 to be the cosine of some other random sky' value s7 so you see just through this kind of comical example that most of these functions are completely useless and don't do anything that will help move the needle towards machine learning algorithms like taking in these input features and then predicting you know see far 10 classes like dog verse truck or something like that so in order to make this algorithm work they need to evaluate a ton of different models with their evolution search strategy so particularly the authors describe searching through two to ten thousand different models per second per CPU some of the ways they do this is by implementing migration is a way of shuffling these different models across the different CPUs to ensure that you have diversity in the population of different workers within this distributed system then they have functional equivalents checking which is where they're making sure to programs aren't having the same output for the given v-0 input features then they have this data set diversity which is where the way that they search through tasks like m-miss or CFR 10 is they're doing binary classification where they construct these artificial by not artificial but binary classification tasks from a multi-label classification data set so for example 0 verse 3 and M miss or 2 verse 7 or 3 verse 9 and then in CFR 10 things like carvers truck dog versus cat and things like that so they find that having a more diverse set of tasks further helps speed up the usefulness of these programs and then they implement progressive dynamic hurdling which is an algorithm from a previous paper from these authors the evolve transformer which is where they have this intermediate fitness evaluation to truncate the models that aren't performing very well to learn more about some of the ways the authors are speeding up the throughput of their evolutionary search you can check out progressive dynamic hurdles and the evolve transformer and you can check out this description of the migration algorithm for shuffling the different populations of increasing diversity across the different workers working on this population of evolutionary candidates you read the paper of parallelism and evolutionary algorithms that describes a lot of these important technical implementation details for speeding up the amount of models you can evaluate in the evolutionary search taking the search space of having these primitive mathematical operations that are operating on these memory addresses of scalar vector and matrix variables they're looking to experiment on linear regression predicting labels constructed by a teacher network and then see far 10 M NIST Street View house number and tiny imagenet classification problems they're looking to answer these questions how difficult is it to search the auto ml 0 space can they use evolutionary search in order to search this really generic and sparse space where most of the combinations of operations and accessing the memory addresses are completely useless next question is can we use our framework to discover reasonable algorithms with minimal human input so the idea is how you see these evolutionary algorithms is really important so for example if you seed the populations with that original linear regression model it's much more likely to move into you a successful classification of CFR 10 compared to if you just randomly initialize it with different with different operations what they show later in their experiments is that if you take any given successful program if you just initialize the population with it you can further build on that algorithm so it's really important to think about with these evolutionary search strategies how you initialize the population and how random or how far it has to go to get somewhere useful the third question is can we discover different algorithms by varying the type of tasks we use during the search experiment and this describes the way in which they're constructing these binary classification tasks sort of in a similar framework as meta learning which is where they're leveraging the way you classify one problem to learn more about another one the authors begin their experiments with linear regression problems and affine regression problems this shows the difference between random search and evolutionary search as the problems start to get harder in the case of linear regression the random search is able to have a success rate and find programs that can perform this linear regression task but as the problems get more challenging you need to use a more sophisticated evolutionary search strategy and this is important because in a lot of papers like exploring randomly wired networks and hierarchical neural architecture search the researchers design is such a dense architecture search space that you can just randomly compose cells from the search space and still form useful neural architectures but in the autumn l0 search space you need to use these more sophisticated search algorithms like evolution search compared to just randomly putting in different programs with our different values within the genetic code of the program in order to find useful programs because of how sparse and random the spaces this animation from the github repository describing their paper Auto ml0 shows the evolution of the programs and the different features or functions that they learn progressively through evolutionary search you see how it starts off with the linear model without stochastic gradient descent then it learns lost clipping random learning rate hard-coded learning rate r a loofa activations and different things that learns progressively through evolving these different components of the function by having that mutation space of randomly inserting removing different operations randomly flipping out say s2 to s3 or s0 something like that or they're completely randomizing the inside of something like learn predictor setup so it's amazing that this progression of how it goes from just set up predict learn the empty algorithm all the way into linear model and then all the way into this model it has this two layer neural network that's able to perform something like 84% CFR ten binary classification accuracy with the downsampled flatten vector representation of the CFR ten images some of the interesting details of the final program learned through evolutionary search in this program search space on the CFR ten binary classification task is it learns to add noise to the input so in the setup function it learns to sample the v1 the values in the v1 memory address according to this uniform distribution parameterize by alpha and beta and then adds this noise into the v-0 input features and stores it in the v2 memory address then it learns multiplicative interactions so learns to take the dot product between the vector v3 and v4 and store it in the s-1 scalar then it learns to normalize the weight matrix this is a really interesting discovery from this program that it learns to do this kind of normalization of the weight matrix by dividing it by the magnitude of it such that it's like unit matrix so it has a grading normalization and as a normalized error that's learning through this evolutionary search space of the different operations then it learns the accumulation of weight matrices so it learns to accumulate the weights such as it doesn't have to stark of an update so the weights that it stores in that memory address M sub 2 or M 2 the authors further tests how this evolutionary search of the program's adapts to these extreme cases such as few training examples view training epochs or the multi-class classification problem in the setting of view training examples where you only have say 40 images of dogs 40 images of cats and you're doing binary classification it learns to apply this noise to the forward pass of its prediction in order to adapt to that problem setting in the setting of fast training it learns to do this kind of learning rate decay in order to adapt to only having to take say a epochs to train the models during the you know applying that learn function within the algorithm found with this evolutionary search and in the case of multiple classes it learns to adapt this kind of normalization into the weight matrix to adapt to the multiple class problem there's been a lot of research in auto ml neural architecture search learning update rules or learning data augmentation functions some of the connections between Auto ml 0 and neural architecture search is that auto ml 0 is a much more generic and expressive search space compared to a lot of these previous neural architecture search strategies for example in the evolve transformer all the evolutionary search or the search strategy has to do is learning to describe a given cell and this cell is repeated on the top of itself several times to form the overall neural network architecture other papers like hierarchical neural architecture search they construct this cell representation of the blocks that construct the overall architecture in such a way that any even random search can perform well on the search other papers like exploring randomly wired neural networks further show that if you have this representative cell search space you can just randomly connect it to achieve good performance whereas an auto ml0 the search space is so expressive and sparse compared to dense that you needed a sophisticated search strategy like evolutionary search in order to find useful you know candidates in this population of different ways of configuring the search space thanks for watching this explanation of Auto ml 0 to design machine learning algorithms hopefully from this video you're able to take away how these primitive operations are able to access these memory addresses to form these machine learning algorithms such as setting up the different values in the variables then learning how to run the forward press and the predict function where you can do things like add noise in the forward pass in the case when they experiment with few training examples and also seeing how they learn or update the values in these memory addresses by using the learned component function in this framework of setup predict learn and through evolutionary search where they mutate the different operations to form new programs they're able to do things like see fart n binary classification with a two layer neural network is a really interesting study one of the most expressive and generic search spaces tested in the research area of Auto amell neural architecture search or save meta learning thanks for watching and please subscribe to Henry AI labs for more deep learning in AI videos
Original Description
This video explores AutoML-Zero, an evolutionary search for machine learning programs. These programs are initially empty with Setup, Predict, and Learn functions that can access scalar, vector, and matrix memory addresses. Through fitness evaluation and mutation, these programs evolve to use gradient descent, dropout-like operations, and ReLU activation functions! Thanks for watching, Please Subscribe!
Paper Links:
AutoML-Zero: https://arxiv.org/pdf/2003.03384.pdf
Github Repo AutoML-Zero: https://github.com/google-research/google-research/blob/master/automl_zero/README.md
The Evolved Transformer: https://arxiv.org/pdf/1901.11117.pdf
Hierarchical Representations for Efficient Architecture Search: https://arxiv.org/pdf/1711.00436.pdf
Exploring Randomly Wired Neural Networks for Image Recognition: https://arxiv.org/pdf/1904.01569.pdf
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from Connor Shorten · Connor Shorten · 0 of 60
← Previous
Next →
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
DenseNets
Connor Shorten
DeepWalk Explained
Connor Shorten
Inception Network Explained
Connor Shorten
StackGAN
Connor Shorten
StyleGAN
Connor Shorten
Progressive Growing of GANs Explained
Connor Shorten
Improved Techniques for Training GANs
Connor Shorten
Word2Vec Explained
Connor Shorten
Must Read Papers on GANs
Connor Shorten
Unsupervised Feature Learning
Connor Shorten
Self-Supervised GANs
Connor Shorten
Embedding Graphs with Deep Learning
Connor Shorten
Transfer Learning in GANs
Connor Shorten
ReLU Activation Function
Connor Shorten
AC-GAN Explained
Connor Shorten
SimGAN Explained
Connor Shorten
DC-GAN Explained!
Connor Shorten
ResNet Explained!
Connor Shorten
Graph Convolutional Networks
Connor Shorten
Neural Architecture Search
Connor Shorten
Henry AI Labs
Connor Shorten
Video Classification with Deep Learning
Connor Shorten
BigGANs in Data Augmentation
Connor Shorten
Introduction to Deep Learning
Connor Shorten
EfficientNet Explained!
Connor Shorten
Self-Attention GAN
Connor Shorten
Curriculum Learning in Deep Neural Networks
Connor Shorten
Deep Learning Podcast #1 | Edward Dixon | Stochastic Weight Averaging
Connor Shorten
Deep Compression
Connor Shorten
Skin Cancer Classification with Deep Learning
Connor Shorten
Deep Learning Podcast #2 | Edward Peake | Deep Learning in Medical Imaging
Connor Shorten
The Lottery Ticket Hypothesis Explained!
Connor Shorten
SqueezeNet
Connor Shorten
GauGAN Explained!
Connor Shorten
AutoML with Hyperband
Connor Shorten
DL Podcast #3 | Yannic Kilcher | Population-Based Search
Connor Shorten
Weakly Supervised Pretraining
Connor Shorten
Image Data Augmentation for Deep Learning
Connor Shorten
Unsupervised Data Augmentation
Connor Shorten
Wide ResNet Explained!
Connor Shorten
RevNet: Backpropagation without Storing Activations
Connor Shorten
GANs with Fewer Labels
Connor Shorten
BigBiGAN Unsupervised Learning!
Connor Shorten
Self-Supervised Learning
Connor Shorten
Multi-Task Self-Supervised Learning
Connor Shorten
Self-Supervised GANs
Connor Shorten
Population Based Training
Connor Shorten
Show, Attend and Tell
Connor Shorten
Siamese Neural Networks
Connor Shorten
WaveGAN Explained!
Connor Shorten
VAE-GAN Explained!
Connor Shorten
Evolution in Neural Architecture Search!
Connor Shorten
AI Research Weekly Update August 18th, 2019
Connor Shorten
Weight Agnostic Neural Networks Explained!
Connor Shorten
AI Research Weekly Update August 25th, 2019
Connor Shorten
Neuroevolution of Augmenting Topologies (NEAT)
Connor Shorten
CoDeepNEAT
Connor Shorten
AI Research Weekly Update September 1st, 2019
Connor Shorten
Randomly Wired Neural Networks
Connor Shorten
Genetic CNN
Connor Shorten
More on: LLM Foundations
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Want to get started with deep learning
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Medium · Deep Learning
🎓
Tutor Explanation
DeepCamp AI