Apex - Michael Carilli, NVIDIA

PyTorch · Advanced ·🧬 Deep Learning ·6y ago

Key Takeaways

The video discusses using mixed precision with PyTorch and NVIDIA's Apex extension to improve deep learning training performance on NVIDIA GPUs. It highlights the benefits of using a mix of torch.float32 and torch.float16 to take advantage of hardware capabilities and achieve substantial speedups while maintaining accuracy.

Full Transcript

[Music] all right hello everyone my name is Michael Corelli I am a developer technology engineer at Nvidia on the PI torch frameworks team and today I'm going to be talking to you about training with mixed precision using a mixture of torch float or FB 32 and torch half or FP 16 to take full advantage of the hardware capabilities that nvidia x' latest GPUs provide so what are the benefits of this using mixed precision and our latest tensor core enabled architectures your networks can achieve substantial improved speed ups they can be more memory efficient while remaining just as accurate without needing to retune your hyper parameters alternatively the speed ups and memory savings can enable you to experiment with larger networks or larger batch sizes so first question you may be wondering is why why not just stick with the default torch float or FB 32 the answer is first of all FB 16 or torch dot 1/2 enables it takes up half the memory storage it can see a 2x speed-up for bandwidth bound operations but that's not the only benefit on NVIDIA stents or Core GPUs there's dedicated hardware support for matrix multiplies and convolutions with FP 16 input and these hardware cores these these tensor cores give 8 X improved computational throughput for such operations for matrix multiplies and convolutions so if your network happens to use a lot of matrix multiplies in convolutions you can see significantly greater than a 2 X + 2 n speed-up and I'll provide some concrete examples of that shortly so now you may be asking the opposite question why not just use torch dot have for everything the answer is that some operations like accumulations and optimizer updates also benefit from the wider dynamic range and increased precision of FP 32 the idea behind mixed precision is that by assigning each operation it's optimal precision you obtain the speed of FB 16 the precision of FB 32 and take full advantage of the hardware the full hardware capabilities of nvidia gpus and achieve high speed as well as stability so how well does this work in practice here's an example that shows we've achieved substantial speed ups on a diversity of highly relevant real world networks and the links below show that you can you can check out these examples yourself your speed ups may vary depending on whether your network is more compute bound more bandwidth bound or constrained by something else like data loading for example bert you can see achieved a pretty hefty speed-up because it uses a lot of very expensive matrix matrix multiplies for which the tensor cores are highly beneficial you may also be wondering does training with mixed precision affect my accuracy and in practice we found that all the networks that we've trained with FP 16 or rather with mixed precision have converged to comparable accuracy as pure FP 32 training with no hyper parameter changes so that's encouraging so how can you realize these benefits for your own network we've developed this tool called automatic mixed precision or amp whereby you can insert a few lines of Python into your script and it will do this entire recipe for you automatically here's how that looks in a simple example these three lines ensure that every operation runs in its appropriate precision this tool is available to try today through the Nvidia repository of apex utilities I've updated the landing pages to contain a link to this talk so you can also use those to track down the deep learning examples of Burton mask our CNN etc that showcase mixed precision best practices as well as the resulting speed ups so as pi torch developers naturally the best way to reach you guys is through pi torch itself so today I'm happy to announce that we are currently working with the PI towards core team to enable native support for mixed precision in pi torch this quarter I've already got an API request for comment up as well as the first PR so feel free to comment let us know what you need we have certainly tried to take into account all the complex use cases we've accounted for along the way to make sure that the implementation will be powerful and flexible but we're also interested to hear what you guys have to say [Music] [Applause] [Music] [Applause]

Original Description

Apex is an open-source PyTorch extension that helps users maximize deep learning training performance on NVIDIA GPUs. Mixed precision utilities in Apex are designed to improve training speed while maintaining the accuracy and stability of training in single precision. Learn more in this talk.
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from PyTorch · PyTorch · 26 of 60

1 What is PyTorch?
What is PyTorch?
PyTorch
2 PyTorch Tutorial: A Quick Preview
PyTorch Tutorial: A Quick Preview
PyTorch
3 PyTorch Summer Hackathon 2019
PyTorch Summer Hackathon 2019
PyTorch
4 Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz
Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz
PyTorch
5 PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang
PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang
PyTorch
6 Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang
Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang
PyTorch
7 Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian
Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian
PyTorch
8 Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa
Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa
PyTorch
9 Introduction to Machine Learning for Developers at F8 2019
Introduction to Machine Learning for Developers at F8 2019
PyTorch
10 Powered by PyTorch at F8 2019
Powered by PyTorch at F8 2019
PyTorch
11 Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019
Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019
PyTorch
12 New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019
New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019
PyTorch
13 PyTorch Developer Conference 2018: Recap
PyTorch Developer Conference 2018: Recap
PyTorch
14 PyTorch Developer Conference 2018: Keynote & Deep Dive
PyTorch Developer Conference 2018: Keynote & Deep Dive
PyTorch
15 PyTorch Developer Conference 2018: Production & Research Sessions
PyTorch Developer Conference 2018: Production & Research Sessions
PyTorch
16 PyTorch Developer Conference 2018: Cloud & Academia Sessions
PyTorch Developer Conference 2018: Cloud & Academia Sessions
PyTorch
17 PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel
PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel
PyTorch
18 PyTorch Developer Conference 2019 | Full Livestream
PyTorch Developer Conference 2019 | Full Livestream
PyTorch
19 PyTorch Developer Conference 2019: Recap
PyTorch Developer Conference 2019: Recap
PyTorch
20 PyTorch Developer Conference Keynote - Mike Schroepfer
PyTorch Developer Conference Keynote - Mike Schroepfer
PyTorch
21 What’s new in PyTorch 1.3 - Lin Qiao
What’s new in PyTorch 1.3 - Lin Qiao
PyTorch
22 PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan
PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan
PyTorch
23 Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo
Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo
PyTorch
24 Quantization - Dmytro Dzhulgakov
Quantization - Dmytro Dzhulgakov
PyTorch
25 PyTorch ONNX Export Support - Lara Haidar, Microsoft
PyTorch ONNX Export Support - Lara Haidar, Microsoft
PyTorch
Apex -  Michael Carilli, NVIDIA
Apex - Michael Carilli, NVIDIA
PyTorch
27 Dataloader Design for PyTorch - Tongzhou Wang, MIT
Dataloader Design for PyTorch - Tongzhou Wang, MIT
PyTorch
28 Linear Algebra in PyTorch - Vishwak Srinivasan, CMU
Linear Algebra in PyTorch - Vishwak Srinivasan, CMU
PyTorch
29 PyTorch Mobile - David Reiss
PyTorch Mobile - David Reiss
PyTorch
30 Model Interpretability with Captum - Narine Kokhilkyan
Model Interpretability with Captum - Narine Kokhilkyan
PyTorch
31 Detectron2 - Next Gen Object Detection Library - Yuxin Wu
Detectron2 - Next Gen Object Detection Library - Yuxin Wu
PyTorch
32 Speech Extensions to Fairseq - Dmytro Okhonko
Speech Extensions to Fairseq - Dmytro Okhonko
PyTorch
33 PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook
PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook
PyTorch
34 PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu
PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu
PyTorch
35 PyTorch in Robotics - Yisong Yue, Caltech
PyTorch in Robotics - Yisong Yue, Caltech
PyTorch
36 StanfordNLP - Yuhao Zhang, Stanford
StanfordNLP - Yuhao Zhang, Stanford
PyTorch
37 Sotabench for Reproducible Research - Robert Stojnic, Papers with Code
Sotabench for Reproducible Research - Robert Stojnic, Papers with Code
PyTorch
38 Collaborative Natural Language Inference - Sasha Rush, Cornell
Collaborative Natural Language Inference - Sasha Rush, Cornell
PyTorch
39 Privacy Preserving AI - Andrew Trask, OpenMined
Privacy Preserving AI - Andrew Trask, OpenMined
PyTorch
40 CrypTen - Laurens van der Maaten
CrypTen - Laurens van der Maaten
PyTorch
41 PyTorch at Uber - Sidney Zhang, Uber
PyTorch at Uber - Sidney Zhang, Uber
PyTorch
42 PyTorch at Tesla - Andrej Karpathy, Tesla
PyTorch at Tesla - Andrej Karpathy, Tesla
PyTorch
43 PyTorch at Microsoft - Saurabh Tiwary, Microsoft
PyTorch at Microsoft - Saurabh Tiwary, Microsoft
PyTorch
44 PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs
PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs
PyTorch
45 PyTorch Developer Conference 2019 - Panel Discussion
PyTorch Developer Conference 2019 - Panel Discussion
PyTorch
46 Using deep learning and PyTorch to power next gen aircraft at Caltech
Using deep learning and PyTorch to power next gen aircraft at Caltech
PyTorch
47 Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1
Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1
PyTorch
48 TorchScript and PyTorch JIT | Deep Dive
TorchScript and PyTorch JIT | Deep Dive
PyTorch
49 Announcing the PyTorch Global Summer Hackathon 2020
Announcing the PyTorch Global Summer Hackathon 2020
PyTorch
50 Opening Up the Black Box: Model Understanding with Captum and PyTorch
Opening Up the Black Box: Model Understanding with Captum and PyTorch
PyTorch
51 PyTorch Mobile Runtime for Android
PyTorch Mobile Runtime for Android
PyTorch
52 Torchvision in 5 minutes
Torchvision in 5 minutes
PyTorch
53 3D Deep Learning with PyTorch3D
3D Deep Learning with PyTorch3D
PyTorch
54 What is Torchtext?
What is Torchtext?
PyTorch
55 TorchAudio: A Quick Intro
TorchAudio: A Quick Intro
PyTorch
56 PyTorch Mobile Runtime for iOS
PyTorch Mobile Runtime for iOS
PyTorch
57 PySlowFast: Deep learning with Video
PySlowFast: Deep learning with Video
PyTorch
58 PyTorch Pruning | How it's Made by Michela Paganini
PyTorch Pruning | How it's Made by Michela Paganini
PyTorch
59 Measuring Fairness in Machine Learning Systems
Measuring Fairness in Machine Learning Systems
PyTorch
60 PyTorch for Hackathons
PyTorch for Hackathons
PyTorch

This video teaches how to use mixed precision with PyTorch and NVIDIA's Apex extension to improve deep learning training performance on NVIDIA GPUs. It covers the benefits of using a mix of torch.float32 and torch.float16 and provides examples of how to achieve substantial speedups while maintaining accuracy.

Key Takeaways
  1. Install PyTorch and NVIDIA's Apex extension
  2. Convert model to use mixed precision
  3. Use Automatic Mixed Precision (AMP) tool to automate the process
  4. Test and evaluate model performance
  5. Optimize hyperparameters for best results
💡 Using mixed precision with PyTorch and NVIDIA's Apex extension can significantly improve deep learning training performance on NVIDIA GPUs while maintaining accuracy.

Related AI Lessons

Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning
Up next
Image Classification with ml5.js
The Coding Train
Watch →