Hyperparameter Importance | PyTorch Developer Day 2020

PyTorch · Intermediate ·🧬 Deep Learning ·5y ago

Skills: ML Pipelines70%

Key Takeaways

Examines hyperparameter importance and how to tune them using Optuna in PyTorch

Full Transcript

[Music] hello my name is chrisman loomis and i'm an engineer preferred networks today i'll be presenting on hyper parameter importance so a little bit more about myself i live in japan and as mentioned have been working for a company called preferred networks in japan which is one of the premier industrial deep learning companies in japan previously i worked on the chainer team within preferred networks which is one of the precursors of pytorch and helped define the define by run algorithm although now i'm on the auto machine learning team specifically working on the uptuna team so to give you an overview of what we'll be talking about tonight the first thing we're going to talk about is what are hyper parameters and then from there we'll talk more about what the impact is on your performance of your algorithms and then talk about current usage how people currently usually tune hyperparameters after that we'll go into some of the criteria of how you could choose which hyper parameters you should tune and then talk more about optuna which is a particular example of a hyperparameter optimization framework which goes very well with python programs and in specific with pi torch and we'll look and see how it could look inside of pi torch itself then we'll talk about how many hyper parameters you should tune at a time and how you can go about choosing them so starting ahead with what are hyper parameters typically in the deep learning world you might think of hyperparameters as the number of layers or the number of units within each layer or perhaps the learning rate of your optimizer generally they are controlling the behavior of the algorithms and they're determining the performance of those algorithms how well they actually do and they're typically set manually and we'll say more about that in a minute but they also determine the success or failure of your overall algorithms so object detection with a bad threshold parameter you can see on the left can produce too many bounding boxes in this particular case of the threshold hyperparameter and with a good threshold hyperparameter can provide a much more clean resolution where you have one bounding box basically per object but these are not the only hyperparameters there might be more than you thought of before uh if you just take a look at the image itself there's the encoding that's used for the image also uh what the order the image sizes that's used or the jpeg decoder then within the neural network trainer there's the batch size what optimizer is chosen stochastic gradient descent atom momentum or others and then the learning rate that's used by that optimizer itself in this particular case since we're looking at a detector model there's all of the visual information for the cnn the backbone architecture whether vgg or resnet the kernel size that's used to go over the image batch normalization order and other things and then down at the hardware layer you might be looking at whether you want to use floating point 16 or fp32 or mixed precision or on a nvidia gpu you might be looking at what cuda kernel parameters you want to use but generally swimming in too many of these can then cause you'd have a different issue but let's take a look at the impact of how this works so we looked at a hyper parameter optimization paper that looked at the out review of the algorithms and applications and found that if you compare doing a random search with bayesian optimization compared to hyperband and bayesian observation with hyperband it could provide almost a 20 times speed up and this advantage persisted at the long time frames as well and could increase actually to up to a 50 plus times speed up so the hyper parameters make a great difference to what the overall performance is but if you look at the current practice in them there was a survey that was done in machine learning experimental methods at nurips 2019 and iclr 2020 and it found that of all the people who were working with programs that have hyper parameters the majority of them were doing uh either not tuning them manual tuning a random search and that only about six percent of the people were using a hyper parameter optimization framework so given the huge benefits that can be available by tuning the hyper parameters we think this is a real opportunity for improving the general performance of pi torch and deep learning so but the problem is that if you try to optimize all of those hyper parameters simultaneously you're going to run into the curse of dimensionality uh with all those hyper parameters tuning at the same time the search space becomes too highly dimensional and it will take a long time for any hyperparameter optimization framework to find the best hyper parameters so in order to combat this uh we took a look at it in paper uh for the efficient approach for assessing hyperparameter importance and hyperparameter importance is basically a way that you can take a look and find out which of the hyper parameters it is that makes the most difference to the overall performance of your algorithms and taking a look at this we then implemented this hyper parameter importance into the optuna framework which we believe is the next generation hyperparameter organization framework optimization framework because it allows you to then not only optimize your hyper parameters but using hyperparameters can help you to select the most important ones to work with so let's take a look at how this could look in pi torch so in the model definition basically then we have to define a trial this is doing a simple mnist and around on the fifth line you see the out features uses the input trial object that was put into the function and gives a trial suggest integer for the number of units in the layer between 4 and 128 and then also uses a categorical list of either relu or tan for the activation for each layer and then the next line provides a float for the dropout value which ranges from 0.2 to 0.5 and notice that all of these hyper parameters are actually defined within the actual code itself in a define by run kind of way where it's very intuitive to see what the range is and it's defined using pythonic syntax for easy troubleshooting and definition then as we gonna go on to the objective function you can see that within the objective function as well we need to have a trial which object which is passed and this objective function is then used by apptuna to review and evaluate what the value how well the trial performed and in this one we have an optimizer name which is also given by the trial object in suggest categorical which then picks uh the optimizer from a list of atom rs rms prompt and stochastic gradient descent and then the next line gives us a log uniform which is a float which varies logarithmically to give us the full range of possible learning rates between 10 to the negative fifth and 10 to the negative one so then looking at the results of this simple mnist example we find some things that maybe confirm what you might have guessed which is that learning rate is the most important hyper parameter but then the next most important hyper parameter at half the importance of the learning rate is actually the number of units in the very first layer the number of units in the second layer is much less important less than a fifth is important and then the third most important hyper parameter is which optimizer was used which you might have guessed was an important factor but again this is not as important as the number of units in the first layer and then beyond that we see the dropout layer in the first layer um is also one of the important hyper parameters and other hyper parameters are less so so how many hyper parameters do we recommend that you should pick from our experience for reasonable time versus performance about the top three to five hyper parameters are the best hyper parameters to focus on so the steps then for tuning and using hyper parameter importance is to start with uh basically tuning all the hyper parameters you think might matter for the first 100 or so trials to give yourself a solid baseline and to give optuna some time to search the hyperparameter space then pick about the three to five hyper parameters uh using hyperparameter importance to see which ones have the most impact on the performance of your algorithm and then run the rest of the trials with the time or compute that you have available to you and hopefully win so as you can see there's kind of a hyperparameter evolution so i think the first step is just using sort of the default or not tuning the hyper parameters and the next step is manually fidgeting with those hyper parameters to see which is the most important then after that maybe using a grid search and then hopefully using a hyperparameter optimizer like uptuna to systematically using bayesian optimization look for the best type of parameters and then finally uh hopefully using optuna leverage with hyperparameter importance so that you limit it down to the hyperparameters which have the most impact on your overall performance these are some of the resources that you can look at for more information there's tuna dot org which is the home page for up tuna our github at optunapptuna also there is an ecosystem presentation on using uptonu with pi torch which you can find by googling and then also we have the papers which i've referred to in this discussion so thank you for your attention and i hope that you found optuna interesting and have a good rest of your pytorch dev day thank you

Original Description

Hyperparameters are manual, often hard-coded, settings in programming, but many programmers don't use a hyperparameter optimizer. In this talk, engineer and business developer Crissman Loomis examines what hyperparameters are, how to find out what the most important hyperparameters for your PyTorch code are, and how to tune them using Optuna.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from PyTorch · PyTorch · 0 of 60

← Previous Next →

What is PyTorch?

What is PyTorch?

PyTorch Tutorial: A Quick Preview

PyTorch Tutorial: A Quick Preview

PyTorch Summer Hackathon 2019

PyTorch Summer Hackathon 2019

Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz

Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz

PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang

PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang

Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang

Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang

Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian

Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian

Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa

Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa

Introduction to Machine Learning for Developers at F8 2019

Introduction to Machine Learning for Developers at F8 2019

Powered by PyTorch at F8 2019

Powered by PyTorch at F8 2019

Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019

Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019

New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019

New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019

PyTorch Developer Conference 2018: Recap

PyTorch Developer Conference 2018: Recap

PyTorch Developer Conference 2018: Keynote & Deep Dive

PyTorch Developer Conference 2018: Keynote & Deep Dive

PyTorch Developer Conference 2018: Production & Research Sessions

PyTorch Developer Conference 2018: Production & Research Sessions

PyTorch Developer Conference 2018: Cloud & Academia Sessions

PyTorch Developer Conference 2018: Cloud & Academia Sessions

PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel

PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel

PyTorch Developer Conference 2019 | Full Livestream

PyTorch Developer Conference 2019 | Full Livestream

PyTorch Developer Conference 2019: Recap

PyTorch Developer Conference 2019: Recap

PyTorch Developer Conference Keynote - Mike Schroepfer

PyTorch Developer Conference Keynote - Mike Schroepfer

What’s new in PyTorch 1.3 - Lin Qiao

What’s new in PyTorch 1.3 - Lin Qiao

PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan

PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan

Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo

Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo

Quantization - Dmytro Dzhulgakov

Quantization - Dmytro Dzhulgakov

PyTorch ONNX Export Support - Lara Haidar, Microsoft

PyTorch ONNX Export Support - Lara Haidar, Microsoft

Apex - Michael Carilli, NVIDIA

Apex - Michael Carilli, NVIDIA

Dataloader Design for PyTorch - Tongzhou Wang, MIT

Dataloader Design for PyTorch - Tongzhou Wang, MIT

Linear Algebra in PyTorch - Vishwak Srinivasan, CMU

Linear Algebra in PyTorch - Vishwak Srinivasan, CMU

PyTorch Mobile - David Reiss

PyTorch Mobile - David Reiss

Model Interpretability with Captum - Narine Kokhilkyan

Model Interpretability with Captum - Narine Kokhilkyan

Detectron2 - Next Gen Object Detection Library - Yuxin Wu

Detectron2 - Next Gen Object Detection Library - Yuxin Wu

Speech Extensions to Fairseq - Dmytro Okhonko

Speech Extensions to Fairseq - Dmytro Okhonko

PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook

PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook

PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu

PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu

PyTorch in Robotics - Yisong Yue, Caltech

PyTorch in Robotics - Yisong Yue, Caltech

StanfordNLP - Yuhao Zhang, Stanford

StanfordNLP - Yuhao Zhang, Stanford

Sotabench for Reproducible Research - Robert Stojnic, Papers with Code

Sotabench for Reproducible Research - Robert Stojnic, Papers with Code

Collaborative Natural Language Inference - Sasha Rush, Cornell

Collaborative Natural Language Inference - Sasha Rush, Cornell

Privacy Preserving AI - Andrew Trask, OpenMined

Privacy Preserving AI - Andrew Trask, OpenMined

CrypTen - Laurens van der Maaten

CrypTen - Laurens van der Maaten

PyTorch at Uber - Sidney Zhang, Uber

PyTorch at Uber - Sidney Zhang, Uber

PyTorch at Tesla - Andrej Karpathy, Tesla

PyTorch at Tesla - Andrej Karpathy, Tesla

PyTorch at Microsoft - Saurabh Tiwary, Microsoft

PyTorch at Microsoft - Saurabh Tiwary, Microsoft

PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs

PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs

PyTorch Developer Conference 2019 - Panel Discussion

PyTorch Developer Conference 2019 - Panel Discussion

Using deep learning and PyTorch to power next gen aircraft at Caltech

Using deep learning and PyTorch to power next gen aircraft at Caltech

Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1

Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1

TorchScript and PyTorch JIT | Deep Dive

TorchScript and PyTorch JIT | Deep Dive

Announcing the PyTorch Global Summer Hackathon 2020

Announcing the PyTorch Global Summer Hackathon 2020

Opening Up the Black Box: Model Understanding with Captum and PyTorch

Opening Up the Black Box: Model Understanding with Captum and PyTorch

PyTorch Mobile Runtime for Android

PyTorch Mobile Runtime for Android

Torchvision in 5 minutes

Torchvision in 5 minutes

3D Deep Learning with PyTorch3D

3D Deep Learning with PyTorch3D

What is Torchtext?

What is Torchtext?

TorchAudio: A Quick Intro

TorchAudio: A Quick Intro

PyTorch Mobile Runtime for iOS

PyTorch Mobile Runtime for iOS

PySlowFast: Deep learning with Video

PySlowFast: Deep learning with Video

PyTorch Pruning | How it's Made by Michela Paganini

PyTorch Pruning | How it's Made by Michela Paganini

Measuring Fairness in Machine Learning Systems

Measuring Fairness in Machine Learning Systems

PyTorch for Hackathons

PyTorch for Hackathons

More on: ML Pipelines

View skill →

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Complete Dockers For Data Science Tutorial In One Shot

Complete Dockers For Data Science Tutorial In One Shot

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Abonia Sojasingarayar

Vertex Pipelines: Qwik Start

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Automate R scripts with GitHub Actions: Deploy a model

Related AI Lessons

Want to get started with deep learning

Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch

Reddit r/deeplearning

Building a Deepfake Detector From Scratch — What Nobody Tells You

Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media

Medium · Deep Learning

Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…

Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance

Medium · Deep Learning

Implementing Neural Style Transfer from Scratch: The Project That Started It All

Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning

Medium · Deep Learning

Image Classification with ml5.js

The Coding Train