Training models with only 4 bits | Fully-Quantized Training

Julia Turc · Advanced ·📐 ML Fundamentals ·11mo ago

Skills: Fine-tuning LLMs90%

Can you really train a large language model in just 4 bits? In this video, we explore the cutting edge of model compression: fully quantized training in FP4 (4-bit floating point). While quantization has traditionally focused on inference, new research pushes the limits of training efficiency — reducing memory, compute, and cost. 🧠 We cover: ✅ NVIDIA TensorCores for mixed precision training ✅ Micro-scaling (MX) data formats ✅ Modeling tricks for 4-bit gradients (e.g. Stochastic Rounding) 📎 Resources: 🔵 Main paper: https://arxiv.org/abs/2505.19115 🔵 US congressional report on DeepSeek: https://selectcommitteeontheccp.house.gov/sites/evo-subsites/selectcommitteeontheccp.house.gov/files/evo-media-document/DeepSeek%20Final.pdf 🔵 Slide deck and full reading list: https://www.patreon.com/c/JuliaTurc Watch the entire quantization series here: https://youtube.com/playlist?list=PL4bm2lr9UVG0HvePBXvsceO4yuLC8HhUh&si=xLu7vxMfNdJxkB0S 00:00 Intro 01:00 Motivation (training is expensive) 03:06 Mixed precision 05:40 Hardware support: FP4 in NVIDIA Blackwell 13:51 Microscaling formats (MXFP4 & NVFP4) 17:45 Why not INT4? 19:51 Modeling tricks: Stochastic Rounding 22:26 Outro

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Fine-tuning LLMs

View skill →

Fine-tuning T5 LLM for Text Generation: Complete Tutorial w/ free COLAB #coding

Fine-tuning T5 LLM for Text Generation: Complete Tutorial w/ free COLAB #coding

Train image classifier using transfer learning - Fine-tuning MobileNet with Keras

Train image classifier using transfer learning - Fine-tuning MobileNet with Keras

Advanced Fine-Tuning in Rust

Advanced Fine-Tuning in Rust

GPT-4o: Fine-tune OpenAI's Multimodal Model | Live Coding & Q&A (Oct 3rd)

GPT-4o: Fine-tune OpenAI's Multimodal Model | Live Coding & Q&A (Oct 3rd)

LLM Fine-tuning: Two Crucial Tips for New Models - LLama 2

LLM Fine-tuning: Two Crucial Tips for New Models - LLama 2

SDXL LORA STYLE Training! Get THE PERFECT RESULTS!

SDXL LORA STYLE Training! Get THE PERFECT RESULTS!

Related AI Lessons

What Happens When an Algorithm Knows Your Taste Better Than You Do?

Learn how Spotify's algorithms can know your music taste better than you do, and why it matters for personalized recommendations

Medium · Machine Learning

What Happens When an Algorithm Knows Your Taste Better Than You Do?

Learn how Spotify's algorithms can know your music taste better than you do, and why this matters for personalized recommendations

6 Programming Languages I Learned the Hard Way as a Beginner Developer

A beginner developer shares their experience learning 6 programming languages the hard way, highlighting the importance of hands-on practice beyond tutorials

Medium · Machine Learning

6 Programming Languages I Learned the Hard Way as a Beginner Developer

A beginner developer shares their experience learning 6 programming languages the hard way, highlighting the importance of hands-on practice beyond tutorials

Medium · Programming

Chapters (8)

Intro

1:00 Motivation (training is expensive)

3:06 Mixed precision

5:40 Hardware support: FP4 in NVIDIA Blackwell

13:51 Microscaling formats (MXFP4 & NVFP4)

17:45 Why not INT4?

19:51 Modeling tricks: Stochastic Rounding

22:26 Outro

How Generative AI Actually Works? | Deep Learning & Transformer Architecture Explained Simply

Arivi by HCL GUVI