Training models with only 4 bits | Fully-Quantized Training

Julia Turc · Advanced ·📐 ML Fundamentals ·11mo ago
Can you really train a large language model in just 4 bits? In this video, we explore the cutting edge of model compression: fully quantized training in FP4 (4-bit floating point). While quantization has traditionally focused on inference, new research pushes the limits of training efficiency — reducing memory, compute, and cost. 🧠 We cover: ✅ NVIDIA TensorCores for mixed precision training ✅ Micro-scaling (MX) data formats ✅ Modeling tricks for 4-bit gradients (e.g. Stochastic Rounding) 📎 Resources: 🔵 Main paper: https://arxiv.org/abs/2505.19115 🔵 US congressional report on DeepSeek: https://selectcommitteeontheccp.house.gov/sites/evo-subsites/selectcommitteeontheccp.house.gov/files/evo-media-document/DeepSeek%20Final.pdf 🔵 Slide deck and full reading list: https://www.patreon.com/c/JuliaTurc Watch the entire quantization series here: https://youtube.com/playlist?list=PL4bm2lr9UVG0HvePBXvsceO4yuLC8HhUh&si=xLu7vxMfNdJxkB0S 00:00 Intro 01:00 Motivation (training is expensive) 03:06 Mixed precision 05:40 Hardware support: FP4 in NVIDIA Blackwell 13:51 Microscaling formats (MXFP4 & NVFP4) 17:45 Why not INT4? 19:51 Modeling tricks: Stochastic Rounding 22:26 Outro
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

What Happens When an Algorithm Knows Your Taste Better Than You Do?
Learn how Spotify's algorithms can know your music taste better than you do, and why it matters for personalized recommendations
Medium · Machine Learning
What Happens When an Algorithm Knows Your Taste Better Than You Do?
Learn how Spotify's algorithms can know your music taste better than you do, and why this matters for personalized recommendations
Medium · LLM
6 Programming Languages I Learned the Hard Way as a Beginner Developer
A beginner developer shares their experience learning 6 programming languages the hard way, highlighting the importance of hands-on practice beyond tutorials
Medium · Machine Learning
6 Programming Languages I Learned the Hard Way as a Beginner Developer
A beginner developer shares their experience learning 6 programming languages the hard way, highlighting the importance of hands-on practice beyond tutorials
Medium · Programming

Chapters (8)

Intro
1:00 Motivation (training is expensive)
3:06 Mixed precision
5:40 Hardware support: FP4 in NVIDIA Blackwell
13:51 Microscaling formats (MXFP4 & NVFP4)
17:45 Why not INT4?
19:51 Modeling tricks: Stochastic Rounding
22:26 Outro
Up next
How Generative AI Actually Works? | Deep Learning & Transformer Architecture Explained Simply
Arivi by HCL GUVI
Watch →