Training models with only 4 bits | Fully-Quantized Training
Can you really train a large language model in just 4 bits? In this video, we explore the cutting edge of model compression: fully quantized training in FP4 (4-bit floating point). While quantization has traditionally focused on inference, new research pushes the limits of training efficiency — reducing memory, compute, and cost.
🧠 We cover:
✅ NVIDIA TensorCores for mixed precision training
✅ Micro-scaling (MX) data formats
✅ Modeling tricks for 4-bit gradients (e.g. Stochastic Rounding)
📎 Resources:
🔵 Main paper: https://arxiv.org/abs/2505.19115
🔵 US congressional report on DeepSeek: ht…
Watch on YouTube ↗
(saves to browser)
Chapters (8)
Intro
1:00
Motivation (training is expensive)
3:06
Mixed precision
5:40
Hardware support: FP4 in NVIDIA Blackwell
13:51
Microscaling formats (MXFP4 & NVFP4)
17:45
Why not INT4?
19:51
Modeling tricks: Stochastic Rounding
22:26
Outro
DeepCamp AI