Tips N Tricks # 8: Using automatic mixed precision training with PyTorch 1.6

Abhishek Thakur · Beginner ·🧬 Deep Learning ·6y ago

Key Takeaways

This video demonstrates how to use automatic mixed precision training with PyTorch 1.6 to train the BERT sentiment model, showcasing its benefits in reducing memory consumption and improving training speed.

Full Transcript

hello everyone and welcome to this new short video in this one I'm going to show you how you can use automatic mixed precision from PI Taj natively so PI dot 1.6 is going to have native support for automatic expression in training and mixer engine helps you in many different ways one of the things is your model will occupy less memory you so you can use larger batch sizes you can have faster training and till now we have been using it using Nvidia's apex but now since it's in built its natively supported by part or we can try using that one and see if it brings any kind of improvement so I should make a much longer video on a MP but I will probably if the time permits so you can you can read more about mixed provision training from this paper which is called an expression training and get to know more about it and in this one I'm going to show you how to use it so for just to start with we have already seen so this is the bird sentiment model that I'm using that I have trained a long time ago so if you have not taken a look at it you can take a look at this model it's also in the description box so I'm going to fight this line data parallel for now and then try to train the model so python train dot pi so I'm not changing anything in the model right now so let's see you let's see what happens so you can see that it's showing that the model is training and it's going to take around 32 minutes if we look at the memory consumption it's around 10 gigabytes of GPU memory 9009 9191 and so let me just stop it first and now we can try the mixed precision training and see what happens so q2 use mixed precision there are a few steps it's it's not very difficult so you can you have to import from torch to kuda import EMP which is automatic mixed precision so when you used Nvidia's epic see you used to import from apex import have a MP and then you have to define the scaler before anything begins so that's your gradient scaler grad scaler and then pass it on to the training function so let me just write a cheer scaler okay now we go to our training function and here when we are doing the forward pass so everything remains the same but when we are doing the forward pass we say like we have to use the context of auto casting so with a MP dot auto cast and here also you need to import so from taured CUDA import a MP so MP dot auto cast and then you do the forward pass of the model and also calculate the gloss and when you're done with that you have to do the backward function so in this one you have to just scaler dot scale loss and then backward and then the optimizer step so scaler dot step and then optimizer and then you have to update the scalar so scaler dot update so as you can see it's more straightforward and I think NVIDIA apex is also similar so there's not much difference and now we can start to train this model one more thing that I forgot was to include a scaler here okay and now we can train the model of and see what happens so as you can see now the model is training and showing 18 minutes so previously it was 32 minutes now it's 18 minutes so things are quite good it seems and if I look at memory consumptions now it's 8 gigabytes so we reduce 2 gigabytes of memory and that's that's how FB 16 or mix president training helps you automatic mix version so it's not just a p16 and one more thing to remember that in in the training we used data parallel in the original version so if you're using data parallel then you have to auto cast the forward function so what you can do is you can you can import from char store CUDA import EMP and then you can use you can use it in different ways so you can have the MP dot auto cast a decorator here or you can do with a MP dot auto cost so you can use this context and put everything inside this context so but we are going for the decorator and once you're once you've done that you can use the model in the same way so it's it's yeah it's not very difficult it's very simple and this is like one of the optimizations you should always go for it's going to make your training much faster provided your GPU supports mixed precision training which is Pascal or more yeah and that's it for today's video and I hope you liked it and subscribe my channel if you liked it and you liked click on the like button and share it with your friends so this is all about automatic mix precision in PI touch 1.6 it won't work with pythons 1.5 so you have to go to the nightly version just remember that and if you have any comments write me in the comment section and I would be happy to take a look and reply if you have any queries so thank you very much and see you next time goodbye

Original Description

In this Tips N Tricks video I show you how to use automatic mixed precision training ( #amp ) with #pytorch 1.6 to train the #BERT sentiment model. If you are not familiar with BERT sentiment model, take a look at this video: https://www.youtube.com/watch?v=hinZO--TEk4 Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :) To buy my book, Approaching (Almost) Any Machine Learning problem, please visit: https://bit.ly/buyaaml Follow me on: Twitter: https://twitter.com/abhi1thakur LinkedIn: https://www.linkedin.com/in/abhi1thakur/ Kaggle: https://kaggle.com/abhishek
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Abhishek Thakur · Abhishek Thakur · 32 of 60

1 Episode 1.1: Intro and building a machine learning framework
Episode 1.1: Intro and building a machine learning framework
Abhishek Thakur
2 Episode 1.2: Building an inference for the machine learning framework
Episode 1.2: Building an inference for the machine learning framework
Abhishek Thakur
3 Episode 2: A Cross Validation Framework
Episode 2: A Cross Validation Framework
Abhishek Thakur
4 Tips N Tricks #2: Setting up development environment for machine learning
Tips N Tricks #2: Setting up development environment for machine learning
Abhishek Thakur
5 Episode 3: Handling Categorical Features in Machine Learning Problems
Episode 3: Handling Categorical Features in Machine Learning Problems
Abhishek Thakur
6 BERT on Steroids: Fine-tuning BERT for a dataset using PyTorch and Google Cloud TPUs
BERT on Steroids: Fine-tuning BERT for a dataset using PyTorch and Google Cloud TPUs
Abhishek Thakur
7 Special Announcement: Approaching (almost) any machine learning problem
Special Announcement: Approaching (almost) any machine learning problem
Abhishek Thakur
8 Training BERT Language Model From Scratch On TPUs
Training BERT Language Model From Scratch On TPUs
Abhishek Thakur
9 Bengali.AI: Handwritten Grapheme Classification Using PyTorch (Part-1)
Bengali.AI: Handwritten Grapheme Classification Using PyTorch (Part-1)
Abhishek Thakur
10 Bengali.AI: Handwritten Grapheme Classification Using PyTorch (Part-2)
Bengali.AI: Handwritten Grapheme Classification Using PyTorch (Part-2)
Abhishek Thakur
11 Episode 4: Simple and Basic Binary Classification Metrics
Episode 4: Simple and Basic Binary Classification Metrics
Abhishek Thakur
12 Training Sentiment Model Using BERT and Serving it with Flask API
Training Sentiment Model Using BERT and Serving it with Flask API
Abhishek Thakur
13 Episode 5: Entity Embeddings for Categorical Variables
Episode 5: Entity Embeddings for Categorical Variables
Abhishek Thakur
14 Tips N Tricks #5: 3 Simple and Easy Ways to Cache Functions in Python
Tips N Tricks #5: 3 Simple and Easy Ways to Cache Functions in Python
Abhishek Thakur
15 Multi-Lingual Toxic Comment Classification using BERT and TPUs with PyTorch
Multi-Lingual Toxic Comment Classification using BERT and TPUs with PyTorch
Abhishek Thakur
16 Text Extraction From a Corpus Using BERT (AKA Question Answering)
Text Extraction From a Corpus Using BERT (AKA Question Answering)
Abhishek Thakur
17 10K Subscribers: Approaching (almost) Any Machine Learning Problem and Talk Show
10K Subscribers: Approaching (almost) Any Machine Learning Problem and Talk Show
Abhishek Thakur
18 Data Processing For Question & Answering Systems: BERT vs. RoBERTa
Data Processing For Question & Answering Systems: BERT vs. RoBERTa
Abhishek Thakur
19 Tips N Tricks #6: How to train multiple deep neural networks on TPUs simultaneously
Tips N Tricks #6: How to train multiple deep neural networks on TPUs simultaneously
Abhishek Thakur
20 Sentencepiece Tokenizer With Offsets For T5, ALBERT, XLM-RoBERTa And Many More
Sentencepiece Tokenizer With Offsets For T5, ALBERT, XLM-RoBERTa And Many More
Abhishek Thakur
21 Talks # 1:Andrey Lukyanenko - Handwritten digit recognition w/ a twist &  topic modelling over time
Talks # 1:Andrey Lukyanenko - Handwritten digit recognition w/ a twist & topic modelling over time
Abhishek Thakur
22 Episode 6: Simple and Basic Evaluation Metrics For Regression
Episode 6: Simple and Basic Evaluation Metrics For Regression
Abhishek Thakur
23 Talks # 2: Subhaditya Mukherjee - Image restoration using Deep Learning: Dehazing
Talks # 2: Subhaditya Mukherjee - Image restoration using Deep Learning: Dehazing
Abhishek Thakur
24 Basic git commands everyone should know about
Basic git commands everyone should know about
Abhishek Thakur
25 How do I start my career in Data Science?
How do I start my career in Data Science?
Abhishek Thakur
26 Talks # 3: Lorenzo Ampil - Introduction to T5 for Sentiment Span Extraction
Talks # 3: Lorenzo Ampil - Introduction to T5 for Sentiment Span Extraction
Abhishek Thakur
27 Detecting Skin Cancer (Melanoma) With Deep Learning
Detecting Skin Cancer (Melanoma) With Deep Learning
Abhishek Thakur
28 Talks # 4: Sebastien Fischman - Pytorch-TabNet: Beating XGBoost on Tabular Data Using Deep Learning
Talks # 4: Sebastien Fischman - Pytorch-TabNet: Beating XGBoost on Tabular Data Using Deep Learning
Abhishek Thakur
29 Build a web-app to serve a deep learning model for skin cancer detection
Build a web-app to serve a deep learning model for skin cancer detection
Abhishek Thakur
30 Talks # 5: Parul Pandey: Data Science, Diversity and Kaggle
Talks # 5: Parul Pandey: Data Science, Diversity and Kaggle
Abhishek Thakur
31 Implementing original U-Net from scratch using PyTorch
Implementing original U-Net from scratch using PyTorch
Abhishek Thakur
Tips N Tricks # 8: Using automatic mixed precision training with PyTorch 1.6
Tips N Tricks # 8: Using automatic mixed precision training with PyTorch 1.6
Abhishek Thakur
33 Talks # 6: Mani Sarkar: From backend development to machine learning
Talks # 6: Mani Sarkar: From backend development to machine learning
Abhishek Thakur
34 Dockerizing the skin cancer detection web application
Dockerizing the skin cancer detection web application
Abhishek Thakur
35 How to train a deep learning model using docker?
How to train a deep learning model using docker?
Abhishek Thakur
36 Building an entity extraction model using BERT
Building an entity extraction model using BERT
Abhishek Thakur
37 Train custom object detection model with YOLO V5
Train custom object detection model with YOLO V5
Abhishek Thakur
38 Talks # 7: Moez Ali: Machine learning with PyCaret
Talks # 7: Moez Ali: Machine learning with PyCaret
Abhishek Thakur
39 How to convert almost any PyTorch model to ONNX and serve it using flask
How to convert almost any PyTorch model to ONNX and serve it using flask
Abhishek Thakur
40 Hyperparameter Optimization: This Tutorial Is All You Need
Hyperparameter Optimization: This Tutorial Is All You Need
Abhishek Thakur
41 I finally got a copy of "Approaching (Almost) Any Machine Learning Problem"
I finally got a copy of "Approaching (Almost) Any Machine Learning Problem"
Abhishek Thakur
42 Captcha recognition using PyTorch (Convolutional-RNN + CTC Loss)
Captcha recognition using PyTorch (Convolutional-RNN + CTC Loss)
Abhishek Thakur
43 Live Q&A: Getting Started With Data Science
Live Q&A: Getting Started With Data Science
Abhishek Thakur
44 WTFML: Simple, reusable code for PyTorch models
WTFML: Simple, reusable code for PyTorch models
Abhishek Thakur
45 Talks # 8: Sebastián Ramírez; Build a machine learning API  from scratch  with FastAPI
Talks # 8: Sebastián Ramírez; Build a machine learning API from scratch with FastAPI
Abhishek Thakur
46 Data Science PC Configs: From Low Range to Super-High Range
Data Science PC Configs: From Low Range to Super-High Range
Abhishek Thakur
47 BERT Model Architectures For Semantic Similarity
BERT Model Architectures For Semantic Similarity
Abhishek Thakur
48 I just got access to GitHub's Codespaces and it's amazing!
I just got access to GitHub's Codespaces and it's amazing!
Abhishek Thakur
49 Talks # 9: Vladimir Iglovikov; Detecting Masked Faces In The Pandemic World
Talks # 9: Vladimir Iglovikov; Detecting Masked Faces In The Pandemic World
Abhishek Thakur
50 Tips To Build A Good Data Science / Machine Learning Project (For Your Portfolio)
Tips To Build A Good Data Science / Machine Learning Project (For Your Portfolio)
Abhishek Thakur
51 Docker For Data Scientists
Docker For Data Scientists
Abhishek Thakur
52 How To Become A Data Scientist In 1 Year (Learn From A Real World Example)
How To Become A Data Scientist In 1 Year (Learn From A Real World Example)
Abhishek Thakur
53 Talks # 10: Tanishq Abraham; What are CycleGANs? (a novel deep learning tool in pathology)
Talks # 10: Tanishq Abraham; What are CycleGANs? (a novel deep learning tool in pathology)
Abhishek Thakur
54 Deploy Any Machine Learning Or Deep Learning Model On Google Cloud Platform (App Engine)
Deploy Any Machine Learning Or Deep Learning Model On Google Cloud Platform (App Engine)
Abhishek Thakur
55 Pair Programming: Deep Learning Model For Drug Classification With Andrey Lukyanenko
Pair Programming: Deep Learning Model For Drug Classification With Andrey Lukyanenko
Abhishek Thakur
56 VS Code (codeserver) on Google Colab / Kaggle / Anywhere
VS Code (codeserver) on Google Colab / Kaggle / Anywhere
Abhishek Thakur
57 Talks # 11: Jean-François Puget; Did you know GPUs are not just for Deep Learning?
Talks # 11: Jean-François Puget; Did you know GPUs are not just for Deep Learning?
Abhishek Thakur
58 End-to-End: Automated Hyperparameter Tuning For Deep Neural Networks
End-to-End: Automated Hyperparameter Tuning For Deep Neural Networks
Abhishek Thakur
59 Deploy Any Machine Learning (or Deep Learning) Endpoint on Google Cloud Platform In 10 minutes
Deploy Any Machine Learning (or Deep Learning) Endpoint on Google Cloud Platform In 10 minutes
Abhishek Thakur
60 Ensembling, Blending & Stacking
Ensembling, Blending & Stacking
Abhishek Thakur

This video teaches how to use automatic mixed precision training with PyTorch 1.6 to improve model training speed and reduce memory consumption. It covers the benefits and implementation of mixed precision training using PyTorch's built-in support.

Key Takeaways
  1. Import the automatic mixed precision module from PyTorch
  2. Define a gradient scaler
  3. Use auto casting for forward pass
  4. Scale loss and perform backward pass
  5. Update the scaler and optimizer
💡 Automatic mixed precision training can significantly reduce memory consumption and improve training speed, making it a valuable optimization technique for deep learning models.

Related AI Lessons

Want to get started with deep learning
Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch
Reddit r/deeplearning
Building a Deepfake Detector From Scratch — What Nobody Tells You
Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media
Medium · Deep Learning
Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…
Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance
Medium · Deep Learning
Implementing Neural Style Transfer from Scratch: The Project That Started It All
Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning
Medium · Deep Learning
Up next
Image Classification with ml5.js
The Coding Train
Watch →