Mixed Precision Training | Explanation and PyTorch Implementation from Scratch

ExplainingAI · Beginner ·📄 Research Papers Explained ·6mo ago
In this video, we break down Mixed Precision Training. You’ll learn why FP16, BF16, and FP32 matter, what we gain (and lose) when we switch precision, and how mixed precision training lets us train AI models faster and with lesser resources without sacrificing accuracy. We start by understanding floating point formats(specifically FP32), what precision is , and from there transition to lower precision formats like FP16, BF16 . We then explore the real benefits of lower precision, implement mixed precision from scratch, and finally switch to PyTorch’s built-in AMP for training our deep learning models. Training deep neural networks keeps getting more expensive as models grow larger and more complex. Even with powerful GPUs, the compute demand increases almost every year, and hence we need to make deep learning training as efficient as we can, mixed precision training is one such technique that allows us to train large ai models in half the resources. ⏱️ Timestamps 00:00 Why care about Mixed Precision ? 01:19 What is Precision? (FP32 vs FP16 vs BF16 Explained) 10:55 Why Lower Precision Helps 14:01 Mixed Precision Training From Scratch (Step-by-Step) 25:10 Loss Scaling 29:30 Mixed Precision Training in PyTorch (autocast + GradScaler) 31:50 Summary 📖 Resources Mixed Precision Training Paper : https://arxiv.org/pdf/1710.03740 Nice video covering more details on floating point representations : https://www.youtube.com/watch?v=bbkcEiUjehk Video to understand more on denormalized numbers - https://www.youtube.com/watch?v=aPsSAEmwhgA 🔔 Subscribe : https://tinyurl.com/exai-channel-link Email - explainingai.official@gmail.com
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

The ABCs of reading medical research and review papers these days
Learn to critically evaluate medical research papers by accepting nothing at face value, believing no one blindly, and checking everything
Medium · LLM
#1 DevLog Meta-research: I Got Tired of Tab Chaos While Reading Research Papers.
Learn to manage research paper tabs efficiently and apply meta-research techniques to improve productivity
Dev.to AI
How to Set Up a Karpathy-Style Wiki for Your Research Field
Learn to set up a Karpathy-style wiki for your research field to organize and share knowledge effectively
Medium · AI
The Non-Optimality of Scientific Knowledge: Path Dependence, Lock-In, and The Local Minimum Trap
Scientific knowledge may be stuck in a local minimum, hindering optimal progress, and understanding this concept is crucial for advancing research
ArXiv cs.AI

Chapters (7)

Why care about Mixed Precision ?
1:19 What is Precision? (FP32 vs FP16 vs BF16 Explained)
10:55 Why Lower Precision Helps
14:01 Mixed Precision Training From Scratch (Step-by-Step)
25:10 Loss Scaling
29:30 Mixed Precision Training in PyTorch (autocast + GradScaler)
31:50 Summary
Up next
Microsoft Research Forum | Season 2, Episode 4
Microsoft Research
Watch →