LLM Quantization Explained: GPTQ, AWQ, QLoRA, GGUF and More

Name: LLM Quantization Explained: GPTQ, AWQ, QLoRA, GGUF and More
Uploaded: 2026-03-12T20:23:54+00:00
Channel: Tales Of Tensors
Description: 00:00 Introduction to LLM Quantization 02:15 What is Quantization? 04:45 Post-Training Quantization (PTQ) vs. QAT 07:30 GPTQ (Post-Training Quantization...

Tales Of Tensors · Beginner ·🧠 Large Language Models ·2w ago

00:00 Introduction to LLM Quantization 02:15 What is Quantization? 04:45 Post-Training Quantization (PTQ) vs. QAT 07:30 GPTQ (Post-Training Quantization for GPT) 11:12 AWQ (Activation-aware Weight Quantization) 14:20 QLoRA (Quantized Low-Rank Adaptation) 18:05 GGUF and llama.cpp 22:30 EXL2 (ExLlamaV2) 25:50 How to Choose the Right Format 28:40 The Future of Quantization What does it take to run a massive large language model on consumer hardware? In this video, we break down LLM quantization from first principles and show how techniques like INT8, INT4, GPTQ, AWQ, NF4, QLoRA, GGUF, SmoothQuan…

Watch on YouTube ↗ (saves to browser)