LLM Quantization Explained: GPTQ, AWQ, QLoRA, GGUF and More
00:00 Introduction to LLM Quantization
02:15 What is Quantization?
04:45 Post-Training Quantization (PTQ) vs. QAT
07:30 GPTQ (Post-Training Quantization for GPT)
11:12 AWQ (Activation-aware Weight Quantization)
14:20 QLoRA (Quantized Low-Rank Adaptation)
18:05 GGUF and llama.cpp
22:30 EXL2 (ExLlamaV2)
25:50 How to Choose the Right Format
28:40 The Future of Quantization
What does it take to run a massive large language model on consumer hardware? In this video, we break down LLM quantization from first principles and show how techniques like INT8, INT4, GPTQ, AWQ, NF4, QLoRA, GGUF, SmoothQuan…
Watch on YouTube ↗
(saves to browser)
Chapters (10)
Introduction to LLM Quantization
2:15
What is Quantization?
4:45
Post-Training Quantization (PTQ) vs. QAT
7:30
GPTQ (Post-Training Quantization for GPT)
11:12
AWQ (Activation-aware Weight Quantization)
14:20
QLoRA (Quantized Low-Rank Adaptation)
18:05
GGUF and llama.cpp
22:30
EXL2 (ExLlamaV2)
25:50
How to Choose the Right Format
28:40
The Future of Quantization
DeepCamp AI