Apex - Michael Carilli, NVIDIA

PyTorch · Advanced ·🧬 Deep Learning ·6y ago

Skills: ML Pipelines80%Supervised Learning70%

Key Takeaways

The video discusses using mixed precision with PyTorch and NVIDIA's Apex extension to improve deep learning training performance on NVIDIA GPUs. It highlights the benefits of using a mix of torch.float32 and torch.float16 to take advantage of hardware capabilities and achieve substantial speedups while maintaining accuracy.

Full Transcript

[Music] all right hello everyone my name is Michael Corelli I am a developer technology engineer at Nvidia on the PI torch frameworks team and today I'm going to be talking to you about training with mixed precision using a mixture of torch float or FB 32 and torch half or FP 16 to take full advantage of the hardware capabilities that nvidia x' latest GPUs provide so what are the benefits of this using mixed precision and our latest tensor core enabled architectures your networks can achieve substantial improved speed ups they can be more memory efficient while remaining just as accurate without needing to retune your hyper parameters alternatively the speed ups and memory savings can enable you to experiment with larger networks or larger batch sizes so first question you may be wondering is why why not just stick with the default torch float or FB 32 the answer is first of all FB 16 or torch dot 1/2 enables it takes up half the memory storage it can see a 2x speed-up for bandwidth bound operations but that's not the only benefit on NVIDIA stents or Core GPUs there's dedicated hardware support for matrix multiplies and convolutions with FP 16 input and these hardware cores these these tensor cores give 8 X improved computational throughput for such operations for matrix multiplies and convolutions so if your network happens to use a lot of matrix multiplies in convolutions you can see significantly greater than a 2 X + 2 n speed-up and I'll provide some concrete examples of that shortly so now you may be asking the opposite question why not just use torch dot have for everything the answer is that some operations like accumulations and optimizer updates also benefit from the wider dynamic range and increased precision of FP 32 the idea behind mixed precision is that by assigning each operation it's optimal precision you obtain the speed of FB 16 the precision of FB 32 and take full advantage of the hardware the full hardware capabilities of nvidia gpus and achieve high speed as well as stability so how well does this work in practice here's an example that shows we've achieved substantial speed ups on a diversity of highly relevant real world networks and the links below show that you can you can check out these examples yourself your speed ups may vary depending on whether your network is more compute bound more bandwidth bound or constrained by something else like data loading for example bert you can see achieved a pretty hefty speed-up because it uses a lot of very expensive matrix matrix multiplies for which the tensor cores are highly beneficial you may also be wondering does training with mixed precision affect my accuracy and in practice we found that all the networks that we've trained with FP 16 or rather with mixed precision have converged to comparable accuracy as pure FP 32 training with no hyper parameter changes so that's encouraging so how can you realize these benefits for your own network we've developed this tool called automatic mixed precision or amp whereby you can insert a few lines of Python into your script and it will do this entire recipe for you automatically here's how that looks in a simple example these three lines ensure that every operation runs in its appropriate precision this tool is available to try today through the Nvidia repository of apex utilities I've updated the landing pages to contain a link to this talk so you can also use those to track down the deep learning examples of Burton mask our CNN etc that showcase mixed precision best practices as well as the resulting speed ups so as pi torch developers naturally the best way to reach you guys is through pi torch itself so today I'm happy to announce that we are currently working with the PI towards core team to enable native support for mixed precision in pi torch this quarter I've already got an API request for comment up as well as the first PR so feel free to comment let us know what you need we have certainly tried to take into account all the complex use cases we've accounted for along the way to make sure that the implementation will be powerful and flexible but we're also interested to hear what you guys have to say [Music] [Applause] [Music] [Applause]

Original Description

Apex is an open-source PyTorch extension that helps users maximize deep learning training performance on NVIDIA GPUs. Mixed precision utilities in Apex are designed to improve training speed while maintaining the accuracy and stability of training in single precision. Learn more in this talk.

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from PyTorch · PyTorch · 26 of 60

← Previous Next →

What is PyTorch?

What is PyTorch?

PyTorch Tutorial: A Quick Preview

PyTorch Tutorial: A Quick Preview

PyTorch Summer Hackathon 2019

PyTorch Summer Hackathon 2019

Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz

Tips and Tricks on Hacking with PyTorch: A Quick Tutorial by Brad Heintz

PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang

PyTorch 1.2 and PyTorch Hub: A Quick Introduction by Soumith Chintala and Ailing Zhang

Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang

Torchtext 0.4 with Supervised Learning Datasets: A Quick Introduction by George Zhang

Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian

Torchaudio 0.3 with Kaldi Compatibility, New Transforms: A Quick Introduction by Jason Lian

Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa

Torchvision 0.4 with Support for Video: A Quick Introduction by Francisco Massa

Introduction to Machine Learning for Developers at F8 2019

Introduction to Machine Learning for Developers at F8 2019

Powered by PyTorch at F8 2019

Powered by PyTorch at F8 2019

Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019

Developing and Scaling AI Experiences at Facebook with PyTorch at F8 2019

New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019

New Approaches to Image and Video Reconstruction Using Deep Learning at Facebook at F8 2019

PyTorch Developer Conference 2018: Recap

PyTorch Developer Conference 2018: Recap

PyTorch Developer Conference 2018: Keynote & Deep Dive

PyTorch Developer Conference 2018: Keynote & Deep Dive

PyTorch Developer Conference 2018: Production & Research Sessions

PyTorch Developer Conference 2018: Production & Research Sessions

PyTorch Developer Conference 2018: Cloud & Academia Sessions

PyTorch Developer Conference 2018: Cloud & Academia Sessions

PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel

PyTorch Developer Conference 2018: Enterprise, Education, & Future of AI Panel

PyTorch Developer Conference 2019 | Full Livestream

PyTorch Developer Conference 2019 | Full Livestream

PyTorch Developer Conference 2019: Recap

PyTorch Developer Conference 2019: Recap

PyTorch Developer Conference Keynote - Mike Schroepfer

PyTorch Developer Conference Keynote - Mike Schroepfer

What’s new in PyTorch 1.3 - Lin Qiao

What’s new in PyTorch 1.3 - Lin Qiao

PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan

PyTorch Front-End Features: Named Tensors and Type Promotion - Gregory Chanan

Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo

Research to Production: PyTorch JIT/TorchScript Updates - Michael Suo

Quantization - Dmytro Dzhulgakov

Quantization - Dmytro Dzhulgakov

PyTorch ONNX Export Support - Lara Haidar, Microsoft

PyTorch ONNX Export Support - Lara Haidar, Microsoft

Apex - Michael Carilli, NVIDIA

Apex - Michael Carilli, NVIDIA

Dataloader Design for PyTorch - Tongzhou Wang, MIT

Dataloader Design for PyTorch - Tongzhou Wang, MIT

Linear Algebra in PyTorch - Vishwak Srinivasan, CMU

Linear Algebra in PyTorch - Vishwak Srinivasan, CMU

PyTorch Mobile - David Reiss

PyTorch Mobile - David Reiss

Model Interpretability with Captum - Narine Kokhilkyan

Model Interpretability with Captum - Narine Kokhilkyan

Detectron2 - Next Gen Object Detection Library - Yuxin Wu

Detectron2 - Next Gen Object Detection Library - Yuxin Wu

Speech Extensions to Fairseq - Dmytro Okhonko

Speech Extensions to Fairseq - Dmytro Okhonko

PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook

PyTorch on Google Cloud TPUs - Google, Salesforce, Facebook

PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu

PyTorch Summer Hackathon Winners - Joe Spisak, Sebastien Arnold, Tristan Deleu

PyTorch in Robotics - Yisong Yue, Caltech

PyTorch in Robotics - Yisong Yue, Caltech

StanfordNLP - Yuhao Zhang, Stanford

StanfordNLP - Yuhao Zhang, Stanford

Sotabench for Reproducible Research - Robert Stojnic, Papers with Code

Sotabench for Reproducible Research - Robert Stojnic, Papers with Code

Collaborative Natural Language Inference - Sasha Rush, Cornell

Collaborative Natural Language Inference - Sasha Rush, Cornell

Privacy Preserving AI - Andrew Trask, OpenMined

Privacy Preserving AI - Andrew Trask, OpenMined

CrypTen - Laurens van der Maaten

CrypTen - Laurens van der Maaten

PyTorch at Uber - Sidney Zhang, Uber

PyTorch at Uber - Sidney Zhang, Uber

PyTorch at Tesla - Andrej Karpathy, Tesla

PyTorch at Tesla - Andrej Karpathy, Tesla

PyTorch at Microsoft - Saurabh Tiwary, Microsoft

PyTorch at Microsoft - Saurabh Tiwary, Microsoft

PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs

PyTorch at Dolby Labs - Vivek Kumar, Dolby Labs

PyTorch Developer Conference 2019 - Panel Discussion

PyTorch Developer Conference 2019 - Panel Discussion

Using deep learning and PyTorch to power next gen aircraft at Caltech

Using deep learning and PyTorch to power next gen aircraft at Caltech

Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1

Named Tensors, Model Quantization, and the Latest PyTorch Features - Part 1

TorchScript and PyTorch JIT | Deep Dive

TorchScript and PyTorch JIT | Deep Dive

Announcing the PyTorch Global Summer Hackathon 2020

Announcing the PyTorch Global Summer Hackathon 2020

Opening Up the Black Box: Model Understanding with Captum and PyTorch

Opening Up the Black Box: Model Understanding with Captum and PyTorch

PyTorch Mobile Runtime for Android

PyTorch Mobile Runtime for Android

Torchvision in 5 minutes

Torchvision in 5 minutes

3D Deep Learning with PyTorch3D

3D Deep Learning with PyTorch3D

What is Torchtext?

What is Torchtext?

TorchAudio: A Quick Intro

TorchAudio: A Quick Intro

PyTorch Mobile Runtime for iOS

PyTorch Mobile Runtime for iOS

PySlowFast: Deep learning with Video

PySlowFast: Deep learning with Video

PyTorch Pruning | How it's Made by Michela Paganini

PyTorch Pruning | How it's Made by Michela Paganini

Measuring Fairness in Machine Learning Systems

Measuring Fairness in Machine Learning Systems

PyTorch for Hackathons

PyTorch for Hackathons

This video teaches how to use mixed precision with PyTorch and NVIDIA's Apex extension to improve deep learning training performance on NVIDIA GPUs. It covers the benefits of using a mix of torch.float32 and torch.float16 and provides examples of how to achieve substantial speedups while maintaining accuracy.

Key Takeaways

Install PyTorch and NVIDIA's Apex extension
Convert model to use mixed precision
Use Automatic Mixed Precision (AMP) tool to automate the process
Test and evaluate model performance
Optimize hyperparameters for best results

💡 Using mixed precision with PyTorch and NVIDIA's Apex extension can significantly improve deep learning training performance on NVIDIA GPUs while maintaining accuracy.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Pipelines

View skill →

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Complete Dockers For Data Science Tutorial In One Shot

Complete Dockers For Data Science Tutorial In One Shot

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Abonia Sojasingarayar

Vertex Pipelines: Qwik Start

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Automate R scripts with GitHub Actions: Deploy a model

Related AI Lessons

Want to get started with deep learning

Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch

Reddit r/deeplearning

Building a Deepfake Detector From Scratch — What Nobody Tells You

Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media

Medium · Deep Learning

Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…

Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance

Medium · Deep Learning

Implementing Neural Style Transfer from Scratch: The Project That Started It All

Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning

Medium · Deep Learning

Image Classification with ml5.js

The Coding Train