Training models with only 4 bits | Fully-Quantized Training

Julia Turc · Advanced ·📐 ML Fundamentals ·9mo ago
Can you really train a large language model in just 4 bits? In this video, we explore the cutting edge of model compression: fully quantized training in FP4 (4-bit floating point). While quantization has traditionally focused on inference, new research pushes the limits of training efficiency — reducing memory, compute, and cost. 🧠 We cover: ✅ NVIDIA TensorCores for mixed precision training ✅ Micro-scaling (MX) data formats ✅ Modeling tricks for 4-bit gradients (e.g. Stochastic Rounding) 📎 Resources: 🔵 Main paper: https://arxiv.org/abs/2505.19115 🔵 US congressional report on DeepSeek: ht…
Watch on YouTube ↗ (saves to browser)

Chapters (8)

Intro
1:00 Motivation (training is expensive)
3:06 Mixed precision
5:40 Hardware support: FP4 in NVIDIA Blackwell
13:51 Microscaling formats (MXFP4 & NVFP4)
17:45 Why not INT4?
19:51 Modeling tricks: Stochastic Rounding
22:26 Outro
K-Means Clustering Explained Simply 🤖
Next Up
K-Means Clustering Explained Simply 🤖
Analytics Vidhya