LLM Fine-Tuning 13: LLM Quantization Explained (PART 2) | PTQ, QAT, GPTQ, AWQ, GGUF, GGML, llama.cpp
Welcome to Episode 13 of the LLM Fine-Tuning Series — Quantization Part 2!
In this video, we move beyond the basics and explore advanced quantization techniques that power today’s large language models (LLMs).
If you’ve watched Part 1 (Introduction to Quantization), you’re now ready to dive into real-world algorithms and formats like GPTQ, AWQ, GGML, GGUF, and QAT.
What You’ll Learn in This Video
1️⃣ Advanced LLM Quantization Overview – Why scaling models require smarter compression
2️⃣ GPTQ (Gradient Post-Training Quantization) – How it reduces memory while preserving accuracy
3️⃣ AWQ (Activation-Aware Weight Quantization) – Minimizing quantization error with smarter calibration
4️⃣ Quantization-Aware Training (QAT) – Teaching models to learn with quantization noise
5️⃣ GGML (Georgi Gerganov’s Machine Learning Library) – Lightweight inference engine for CPUs & edge devices
6️⃣ GGUF (General Unified Format) – Standardized, future-proof quantization format for LLMs
7️⃣ Hands-on Practical Demo with LLMs – Applying quantization and measuring speed & memory gains
8️⃣ Choosing the Right Technique – When to use GPTQ, AWQ, GGUF, or QAT
By the end of this video, you’ll be equipped to deploy massive LLMs on limited hardware with blazing speed and lower costs.
Material & Resources: https://github.com/sunnysavita10/Complete-LLM-Finetuning/tree/main/LLM-Quantization-Part-2
🔔 Like, Share & Subscribe to stay updated with the full LLM fine-tuning playlist.
Got questions or topic requests? Drop a comment below 👇.
⏱️ Timestamps / Chapters:
00:00 Introduction to LLM Quantization
06:13 Advanced LLM Quantization Concepts
27:19 PTQ vs QAT for Large Language Models
44:52 GPTQ Theory (Gradient Post-Training Quantization)
01:35:09 GPTQ Hands-on Practical Example
02:10:49 AWQ Theory (Activation-Aware Quantization)
02:27:07 AWQ Hands-on Practical Example
02:35:30 QAT (Quantization Aware Training) Theory + Practical
02:56:17 GGML & GGUF Explained for LLM Inference
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: Fine-tuning LLMs
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
LLM Cost Calculator
Dev.to · Codehelper
How to Run Claude Code Locally (100% Free & Fully Private)
Medium · LLM
Stop Blaming Claude Opus 4.7. Your Prompts Were Always Broken — 4.6 Was Just Carrying You.
Medium · LLM
AI Isn’t “Inspired” by Human Writing. It Is Built on Unpaid Intellectual Labor
Dev.to AI
Chapters (9)
Introduction to LLM Quantization
6:13
Advanced LLM Quantization Concepts
27:19
PTQ vs QAT for Large Language Models
44:52
GPTQ Theory (Gradient Post-Training Quantization)
1:35:09
GPTQ Hands-on Practical Example
2:10:49
AWQ Theory (Activation-Aware Quantization)
2:27:07
AWQ Hands-on Practical Example
2:35:30
QAT (Quantization Aware Training) Theory + Practical
2:56:17
GGML & GGUF Explained for LLM Inference
🎓
Tutor Explanation
DeepCamp AI