The myth of 1-bit LLMs | Quantization-Aware Training

Julia Turc · Advanced ·📄 Research Papers Explained ·10mo ago
Are 1-bit LLMs the future of efficient AI? Or just a catchy Microsoft metaphor? In this video, we break down BitNet, the so-called “1-bit LLM” that isn’t really 1-bit—but still delivers massive speed and memory gains through extreme quantization. 🔍 What you’ll learn: • What fractional (1.58) bits are • How BitNet works under the hood (BitLinear, ELUT, TL1/TL2) • The role of quantization-aware training (QAT) and the Straight-Through Estimator (STE) • Optimizations for ternary matrix multiplication • How 1-bit LLMs scale with parameter count 📄 Main paper: https://arxiv.org/abs/2402.1776…
Watch on YouTube ↗ (saves to browser)

Chapters (9)

Intro
1:05 Inspiration and motivation
5:20 BitNet model architecture
10:21 Quantization-Aware Training
15:21 Storing fractional bits: bitpacking & ELUT
18:12 Open-weights models in HuggingFace
19:52 Ternary matrix multiplication
21:20 Demo & evaluation
23:59 Outro
The Secret Spy Tech Inside Every Credit Card
Next Up
The Secret Spy Tech Inside Every Credit Card
Veritasium