Tips N Tricks # 8: Using automatic mixed precision training with PyTorch 1.6

Abhishek Thakur · Beginner ·🧬 Deep Learning ·6y ago

Skills: ML Pipelines80%Supervised Learning60%

Key Takeaways

This video demonstrates how to use automatic mixed precision training with PyTorch 1.6 to train the BERT sentiment model, showcasing its benefits in reducing memory consumption and improving training speed.

Full Transcript

hello everyone and welcome to this new short video in this one I'm going to show you how you can use automatic mixed precision from PI Taj natively so PI dot 1.6 is going to have native support for automatic expression in training and mixer engine helps you in many different ways one of the things is your model will occupy less memory you so you can use larger batch sizes you can have faster training and till now we have been using it using Nvidia's apex but now since it's in built its natively supported by part or we can try using that one and see if it brings any kind of improvement so I should make a much longer video on a MP but I will probably if the time permits so you can you can read more about mixed provision training from this paper which is called an expression training and get to know more about it and in this one I'm going to show you how to use it so for just to start with we have already seen so this is the bird sentiment model that I'm using that I have trained a long time ago so if you have not taken a look at it you can take a look at this model it's also in the description box so I'm going to fight this line data parallel for now and then try to train the model so python train dot pi so I'm not changing anything in the model right now so let's see you let's see what happens so you can see that it's showing that the model is training and it's going to take around 32 minutes if we look at the memory consumption it's around 10 gigabytes of GPU memory 9009 9191 and so let me just stop it first and now we can try the mixed precision training and see what happens so q2 use mixed precision there are a few steps it's it's not very difficult so you can you have to import from torch to kuda import EMP which is automatic mixed precision so when you used Nvidia's epic see you used to import from apex import have a MP and then you have to define the scaler before anything begins so that's your gradient scaler grad scaler and then pass it on to the training function so let me just write a cheer scaler okay now we go to our training function and here when we are doing the forward pass so everything remains the same but when we are doing the forward pass we say like we have to use the context of auto casting so with a MP dot auto cast and here also you need to import so from taured CUDA import a MP so MP dot auto cast and then you do the forward pass of the model and also calculate the gloss and when you're done with that you have to do the backward function so in this one you have to just scaler dot scale loss and then backward and then the optimizer step so scaler dot step and then optimizer and then you have to update the scalar so scaler dot update so as you can see it's more straightforward and I think NVIDIA apex is also similar so there's not much difference and now we can start to train this model one more thing that I forgot was to include a scaler here okay and now we can train the model of and see what happens so as you can see now the model is training and showing 18 minutes so previously it was 32 minutes now it's 18 minutes so things are quite good it seems and if I look at memory consumptions now it's 8 gigabytes so we reduce 2 gigabytes of memory and that's that's how FB 16 or mix president training helps you automatic mix version so it's not just a p16 and one more thing to remember that in in the training we used data parallel in the original version so if you're using data parallel then you have to auto cast the forward function so what you can do is you can you can import from char store CUDA import EMP and then you can use you can use it in different ways so you can have the MP dot auto cast a decorator here or you can do with a MP dot auto cost so you can use this context and put everything inside this context so but we are going for the decorator and once you're once you've done that you can use the model in the same way so it's it's yeah it's not very difficult it's very simple and this is like one of the optimizations you should always go for it's going to make your training much faster provided your GPU supports mixed precision training which is Pascal or more yeah and that's it for today's video and I hope you liked it and subscribe my channel if you liked it and you liked click on the like button and share it with your friends so this is all about automatic mix precision in PI touch 1.6 it won't work with pythons 1.5 so you have to go to the nightly version just remember that and if you have any comments write me in the comment section and I would be happy to take a look and reply if you have any queries so thank you very much and see you next time goodbye

Original Description

In this Tips N Tricks video I show you how to use automatic mixed precision training ( #amp ) with #pytorch 1.6 to train the #BERT sentiment model. If you are not familiar with BERT sentiment model, take a look at this video: https://www.youtube.com/watch?v=hinZO--TEk4 Please subscribe and like the video to help me keep motivated to make awesome videos like this one. :) To buy my book, Approaching (Almost) Any Machine Learning problem, please visit: https://bit.ly/buyaaml Follow me on: Twitter: https://twitter.com/abhi1thakur LinkedIn: https://www.linkedin.com/in/abhi1thakur/ Kaggle: https://kaggle.com/abhishek

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

Playlist

Uploads from Abhishek Thakur · Abhishek Thakur · 32 of 60

← Previous Next →

Episode 1.1: Intro and building a machine learning framework

Episode 1.1: Intro and building a machine learning framework

Abhishek Thakur

Episode 1.2: Building an inference for the machine learning framework

Episode 1.2: Building an inference for the machine learning framework

Abhishek Thakur

Episode 2: A Cross Validation Framework

Episode 2: A Cross Validation Framework

Abhishek Thakur

Tips N Tricks #2: Setting up development environment for machine learning

Tips N Tricks #2: Setting up development environment for machine learning

Abhishek Thakur

Episode 3: Handling Categorical Features in Machine Learning Problems

Episode 3: Handling Categorical Features in Machine Learning Problems

Abhishek Thakur

BERT on Steroids: Fine-tuning BERT for a dataset using PyTorch and Google Cloud TPUs

BERT on Steroids: Fine-tuning BERT for a dataset using PyTorch and Google Cloud TPUs

Abhishek Thakur

Special Announcement: Approaching (almost) any machine learning problem

Special Announcement: Approaching (almost) any machine learning problem

Abhishek Thakur

Training BERT Language Model From Scratch On TPUs

Training BERT Language Model From Scratch On TPUs

Abhishek Thakur

Bengali.AI: Handwritten Grapheme Classification Using PyTorch (Part-1)

Bengali.AI: Handwritten Grapheme Classification Using PyTorch (Part-1)

Abhishek Thakur

Bengali.AI: Handwritten Grapheme Classification Using PyTorch (Part-2)

Bengali.AI: Handwritten Grapheme Classification Using PyTorch (Part-2)

Abhishek Thakur

Episode 4: Simple and Basic Binary Classification Metrics

Episode 4: Simple and Basic Binary Classification Metrics

Abhishek Thakur

Training Sentiment Model Using BERT and Serving it with Flask API

Training Sentiment Model Using BERT and Serving it with Flask API

Abhishek Thakur

Episode 5: Entity Embeddings for Categorical Variables

Episode 5: Entity Embeddings for Categorical Variables

Abhishek Thakur

Tips N Tricks #5: 3 Simple and Easy Ways to Cache Functions in Python

Tips N Tricks #5: 3 Simple and Easy Ways to Cache Functions in Python

Abhishek Thakur

Multi-Lingual Toxic Comment Classification using BERT and TPUs with PyTorch

Multi-Lingual Toxic Comment Classification using BERT and TPUs with PyTorch

Abhishek Thakur

Text Extraction From a Corpus Using BERT (AKA Question Answering)

Text Extraction From a Corpus Using BERT (AKA Question Answering)

Abhishek Thakur

10K Subscribers: Approaching (almost) Any Machine Learning Problem and Talk Show

10K Subscribers: Approaching (almost) Any Machine Learning Problem and Talk Show

Abhishek Thakur

Data Processing For Question & Answering Systems: BERT vs. RoBERTa

Data Processing For Question & Answering Systems: BERT vs. RoBERTa

Abhishek Thakur

Tips N Tricks #6: How to train multiple deep neural networks on TPUs simultaneously

Tips N Tricks #6: How to train multiple deep neural networks on TPUs simultaneously

Abhishek Thakur

Sentencepiece Tokenizer With Offsets For T5, ALBERT, XLM-RoBERTa And Many More

Sentencepiece Tokenizer With Offsets For T5, ALBERT, XLM-RoBERTa And Many More

Abhishek Thakur

Talks # 1:Andrey Lukyanenko - Handwritten digit recognition w/ a twist & topic modelling over time

Talks # 1:Andrey Lukyanenko - Handwritten digit recognition w/ a twist & topic modelling over time

Abhishek Thakur

Episode 6: Simple and Basic Evaluation Metrics For Regression

Episode 6: Simple and Basic Evaluation Metrics For Regression

Abhishek Thakur

Talks # 2: Subhaditya Mukherjee - Image restoration using Deep Learning: Dehazing

Talks # 2: Subhaditya Mukherjee - Image restoration using Deep Learning: Dehazing

Abhishek Thakur

Basic git commands everyone should know about

Basic git commands everyone should know about

Abhishek Thakur

How do I start my career in Data Science?

How do I start my career in Data Science?

Abhishek Thakur

Talks # 3: Lorenzo Ampil - Introduction to T5 for Sentiment Span Extraction

Talks # 3: Lorenzo Ampil - Introduction to T5 for Sentiment Span Extraction

Abhishek Thakur

Detecting Skin Cancer (Melanoma) With Deep Learning

Detecting Skin Cancer (Melanoma) With Deep Learning

Abhishek Thakur

Talks # 4: Sebastien Fischman - Pytorch-TabNet: Beating XGBoost on Tabular Data Using Deep Learning

Talks # 4: Sebastien Fischman - Pytorch-TabNet: Beating XGBoost on Tabular Data Using Deep Learning

Abhishek Thakur

Build a web-app to serve a deep learning model for skin cancer detection

Build a web-app to serve a deep learning model for skin cancer detection

Abhishek Thakur

Talks # 5: Parul Pandey: Data Science, Diversity and Kaggle

Talks # 5: Parul Pandey: Data Science, Diversity and Kaggle

Abhishek Thakur

Implementing original U-Net from scratch using PyTorch

Implementing original U-Net from scratch using PyTorch

Abhishek Thakur

Tips N Tricks # 8: Using automatic mixed precision training with PyTorch 1.6

Tips N Tricks # 8: Using automatic mixed precision training with PyTorch 1.6

Abhishek Thakur

Talks # 6: Mani Sarkar: From backend development to machine learning

Talks # 6: Mani Sarkar: From backend development to machine learning

Abhishek Thakur

Dockerizing the skin cancer detection web application

Dockerizing the skin cancer detection web application

Abhishek Thakur

How to train a deep learning model using docker?

How to train a deep learning model using docker?

Abhishek Thakur

Building an entity extraction model using BERT

Building an entity extraction model using BERT

Abhishek Thakur

Train custom object detection model with YOLO V5

Train custom object detection model with YOLO V5

Abhishek Thakur

Talks # 7: Moez Ali: Machine learning with PyCaret

Talks # 7: Moez Ali: Machine learning with PyCaret

Abhishek Thakur

How to convert almost any PyTorch model to ONNX and serve it using flask

How to convert almost any PyTorch model to ONNX and serve it using flask

Abhishek Thakur

Hyperparameter Optimization: This Tutorial Is All You Need

Hyperparameter Optimization: This Tutorial Is All You Need

Abhishek Thakur

I finally got a copy of "Approaching (Almost) Any Machine Learning Problem"

I finally got a copy of "Approaching (Almost) Any Machine Learning Problem"

Abhishek Thakur

Captcha recognition using PyTorch (Convolutional-RNN + CTC Loss)

Captcha recognition using PyTorch (Convolutional-RNN + CTC Loss)

Abhishek Thakur

Live Q&A: Getting Started With Data Science

Live Q&A: Getting Started With Data Science

Abhishek Thakur

WTFML: Simple, reusable code for PyTorch models

WTFML: Simple, reusable code for PyTorch models

Abhishek Thakur

Talks # 8: Sebastián Ramírez; Build a machine learning API from scratch with FastAPI

Talks # 8: Sebastián Ramírez; Build a machine learning API from scratch with FastAPI

Abhishek Thakur

Data Science PC Configs: From Low Range to Super-High Range

Data Science PC Configs: From Low Range to Super-High Range

Abhishek Thakur

BERT Model Architectures For Semantic Similarity

BERT Model Architectures For Semantic Similarity

Abhishek Thakur

I just got access to GitHub's Codespaces and it's amazing!

I just got access to GitHub's Codespaces and it's amazing!

Abhishek Thakur

Talks # 9: Vladimir Iglovikov; Detecting Masked Faces In The Pandemic World

Talks # 9: Vladimir Iglovikov; Detecting Masked Faces In The Pandemic World

Abhishek Thakur

Tips To Build A Good Data Science / Machine Learning Project (For Your Portfolio)

Tips To Build A Good Data Science / Machine Learning Project (For Your Portfolio)

Abhishek Thakur

Docker For Data Scientists

Docker For Data Scientists

Abhishek Thakur

How To Become A Data Scientist In 1 Year (Learn From A Real World Example)

How To Become A Data Scientist In 1 Year (Learn From A Real World Example)

Abhishek Thakur

Talks # 10: Tanishq Abraham; What are CycleGANs? (a novel deep learning tool in pathology)

Talks # 10: Tanishq Abraham; What are CycleGANs? (a novel deep learning tool in pathology)

Abhishek Thakur

Deploy Any Machine Learning Or Deep Learning Model On Google Cloud Platform (App Engine)

Deploy Any Machine Learning Or Deep Learning Model On Google Cloud Platform (App Engine)

Abhishek Thakur

Pair Programming: Deep Learning Model For Drug Classification With Andrey Lukyanenko

Pair Programming: Deep Learning Model For Drug Classification With Andrey Lukyanenko

Abhishek Thakur

VS Code (codeserver) on Google Colab / Kaggle / Anywhere

VS Code (codeserver) on Google Colab / Kaggle / Anywhere

Abhishek Thakur

Talks # 11: Jean-François Puget; Did you know GPUs are not just for Deep Learning?

Talks # 11: Jean-François Puget; Did you know GPUs are not just for Deep Learning?

Abhishek Thakur

End-to-End: Automated Hyperparameter Tuning For Deep Neural Networks

End-to-End: Automated Hyperparameter Tuning For Deep Neural Networks

Abhishek Thakur

Deploy Any Machine Learning (or Deep Learning) Endpoint on Google Cloud Platform In 10 minutes

Deploy Any Machine Learning (or Deep Learning) Endpoint on Google Cloud Platform In 10 minutes

Abhishek Thakur

Ensembling, Blending & Stacking

Ensembling, Blending & Stacking

Abhishek Thakur

This video teaches how to use automatic mixed precision training with PyTorch 1.6 to improve model training speed and reduce memory consumption. It covers the benefits and implementation of mixed precision training using PyTorch's built-in support.

Key Takeaways

Import the automatic mixed precision module from PyTorch
Define a gradient scaler
Use auto casting for forward pass
Scale loss and perform backward pass
Update the scaler and optimizer

💡 Automatic mixed precision training can significantly reduce memory consumption and improve training speed, making it a valuable optimization technique for deep learning models.

🔒 Pro feature: Ask AI to explain this lesson →

More on: ML Pipelines

View skill →

Building a Dog Breed Identifier App from scratch - DogNet

Building a Dog Breed Identifier App from scratch - DogNet

Aladdin Persson

Complete Dockers For Data Science Tutorial In One Shot

Complete Dockers For Data Science Tutorial In One Shot

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Part 6 | Deploy ML Model on Kubernetes | Auto-Scaling with HPA and Monitoring with Prometheus

Abonia Sojasingarayar

Vertex Pipelines: Qwik Start

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Coding a Multimodal (Vision) Language Model from scratch in PyTorch with full explanation

Automate R scripts with GitHub Actions: Deploy a model

Related AI Lessons

Want to get started with deep learning

Get started with deep learning by leveraging resources like Andrew Karpathy's playlist and frameworks such as TensorFlow or PyTorch

Reddit r/deeplearning

Building a Deepfake Detector From Scratch — What Nobody Tells You

Learn to build a deepfake detector from scratch and understand the challenges involved in detecting AI-generated fake media

Medium · Deep Learning

Unfolding the Meandering Path: High-Dimensional Invariance and the Flat 2D Plane of Neural…

Learn about high-dimensional invariance and its relation to the flat 2D plane of neural networks, and how to apply these concepts to improve model performance

Medium · Deep Learning

Implementing Neural Style Transfer from Scratch: The Project That Started It All

Learn to implement Neural Style Transfer from scratch and understand its significance in deep learning

Medium · Deep Learning

Image Classification with ml5.js

The Coding Train