Apex - Michael Carilli, NVIDIA
Key Takeaways
The video discusses using mixed precision with PyTorch and NVIDIA's Apex extension to improve deep learning training performance on NVIDIA GPUs. It highlights the benefits of using a mix of torch.float32 and torch.float16 to take advantage of hardware capabilities and achieve substantial speedups while maintaining accuracy.
Full Transcript
[Music] all right hello everyone my name is Michael Corelli I am a developer technology engineer at Nvidia on the PI torch frameworks team and today I'm going to be talking to you about training with mixed precision using a mixture of torch float or FB 32 and torch half or FP 16 to take full advantage of the hardware capabilities that nvidia x' latest GPUs provide so what are the benefits of this using mixed precision and our latest tensor core enabled architectures your networks can achieve substantial improved speed ups they can be more memory efficient while remaining just as accurate without needing to retune your hyper parameters alternatively the speed ups and memory savings can enable you to experiment with larger networks or larger batch sizes so first question you may be wondering is why why not just stick with the default torch float or FB 32 the answer is first of all FB 16 or torch dot 1/2 enables it takes up half the memory storage it can see a 2x speed-up for bandwidth bound operations but that's not the only benefit on NVIDIA stents or Core GPUs there's dedicated hardware support for matrix multiplies and convolutions with FP 16 input and these hardware cores these these tensor cores give 8 X improved computational throughput for such operations for matrix multiplies and convolutions so if your network happens to use a lot of matrix multiplies in convolutions you can see significantly greater than a 2 X + 2 n speed-up and I'll provide some concrete examples of that shortly so now you may be asking the opposite question why not just use torch dot have for everything the answer is that some operations like accumulations and optimizer updates also benefit from the wider dynamic range and increased precision of FP 32 the idea behind mixed precision is that by assigning each operation it's optimal precision you obtain the speed of FB 16 the precision of FB 32 and take full advantage of the hardware the full hardware capabilities of nvidia gpus and achieve high speed as well as stability so how well does this work in practice here's an example that shows we've achieved substantial speed ups on a diversity of highly relevant real world networks and the links below show that you can you can check out these examples yourself your speed ups may vary depending on whether your network is more compute bound more bandwidth bound or constrained by something else like data loading for example bert you can see achieved a pretty hefty speed-up because it uses a lot of very expensive matrix matrix multiplies for which the tensor cores are highly beneficial you may also be wondering does training with mixed precision affect my accuracy and in practice we found that all the networks that we've trained with FP 16 or rather with mixed precision have converged to comparable accuracy as pure FP 32 training with no hyper parameter changes so that's encouraging so how can you realize these benefits for your own network we've developed this tool called automatic mixed precision or amp whereby you can insert a few lines of Python into your script and it will do this entire recipe for you automatically here's how that looks in a simple example these three lines ensure that every operation runs in its appropriate precision this tool is available to try today through the Nvidia repository of apex utilities I've updated the landing pages to contain a link to this talk so you can also use those to track down the deep learning examples of Burton mask our CNN etc that showcase mixed precision best practices as well as the resulting speed ups so as pi torch developers naturally the best way to reach you guys is through pi torch itself so today I'm happy to announce that we are currently working with the PI towards core team to enable native support for mixed precision in pi torch this quarter I've already got an API request for comment up as well as the first PR so feel free to comment let us know what you need we have certainly tried to take into account all the complex use cases we've accounted for along the way to make sure that the implementation will be powerful and flexible but we're also interested to hear what you guys have to say [Music] [Applause] [Music] [Applause]
Original Description
Apex is an open-source PyTorch extension that helps users maximize deep learning training performance on NVIDIA GPUs. Mixed precision utilities in Apex are designed to improve training speed while maintaining the accuracy and stability of training in single precision. Learn more in this talk.
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
Playlist
Uploads from PyTorch · PyTorch · 26 of 60
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
▶
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
What is PyTorch?
PyTorch
PyTorch Tutorial: A Quick Preview
PyTorch
PyTorch Summer Hackathon 2019
PyTorch
Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz
PyTorch
PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang
PyTorch
Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang
PyTorch
Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian
PyTorch
Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa
PyTorch
Introduction to Machine Learning for Developers at F8 2019
PyTorch
Powered by PyTorch at F8 2019
PyTorch
Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019
PyTorch
New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019
PyTorch
PyTorch Developer Conference 2018: Recap
PyTorch
PyTorch Developer Conference 2018: Keynote & Deep Dive
PyTorch
PyTorch Developer Conference 2018: Production & Research Sessions
PyTorch
PyTorch Developer Conference 2018: Cloud & Academia Sessions
PyTorch
PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel
PyTorch
PyTorch Developer Conference 2019 | Full Livestream
PyTorch
PyTorch Developer Conference 2019: Recap
PyTorch
PyTorch Developer Conference Keynote - Mike Schroepfer
PyTorch
What’s new in PyTorch 1.3 - Lin Qiao
PyTorch
PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan
PyTorch
Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo
PyTorch
Quantization - Dmytro Dzhulgakov
PyTorch
PyTorch ONNX Export Support - Lara Haidar, Microsoft
PyTorch
Apex - Michael Carilli, NVIDIA
PyTorch
Dataloader Design for PyTorch - Tongzhou Wang, MIT
PyTorch
Linear Algebra in PyTorch - Vishwak Srinivasan, CMU
PyTorch
PyTorch Mobile - David Reiss
PyTorch
Model Interpretability with Captum - Narine Kokhilkyan
PyTorch
Detectron2 - Next Gen Object Detection Library - Yuxin Wu
PyTorch
Speech Extensions to Fairseq - Dmytro Okhonko
PyTorch
PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook
PyTorch
PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu
PyTorch
PyTorch in Robotics - Yisong Yue, Caltech
PyTorch
StanfordNLP - Yuhao Zhang, Stanford
PyTorch
Sotabench for Reproducible Research - Robert Stojnic, Papers with Code
PyTorch
Collaborative Natural Language Inference - Sasha Rush, Cornell
PyTorch
Privacy Preserving AI - Andrew Trask, OpenMined
PyTorch
CrypTen - Laurens van der Maaten
PyTorch
PyTorch at Uber - Sidney Zhang, Uber
PyTorch
PyTorch at Tesla - Andrej Karpathy, Tesla
PyTorch
PyTorch at Microsoft - Saurabh Tiwary, Microsoft
PyTorch
PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs
PyTorch
PyTorch Developer Conference 2019 - Panel Discussion
PyTorch
Using deep learning and PyTorch to power next gen aircraft at Caltech
PyTorch
Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1
PyTorch
TorchScript and PyTorch JIT | Deep Dive
PyTorch
Announcing the PyTorch Global Summer Hackathon 2020
PyTorch
Opening Up the Black Box: Model Understanding with Captum and PyTorch
PyTorch
PyTorch Mobile Runtime for Android
PyTorch
Torchvision in 5 minutes
PyTorch
3D Deep Learning with PyTorch3D
PyTorch
What is Torchtext?
PyTorch
TorchAudio: A Quick Intro
PyTorch
PyTorch Mobile Runtime for iOS
PyTorch
PySlowFast: Deep learning with Video
PyTorch
PyTorch Pruning | How it's Made by Michela Paganini
PyTorch
Measuring Fairness in Machine Learning Systems
PyTorch
PyTorch for Hackathons
PyTorch
More on: ML Pipelines
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Want to get started with deep learning
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Medium · Deep Learning
🎓
Tutor Explanation
DeepCamp AI