LLM Fine-Tuning 13: LLM Quantization Explained (PART 2) | PTQ, QAT, GPTQ, AWQ, GGUF, GGML, llama.cpp

Sunny Savita · Beginner ·🧠 Large Language Models ·7mo ago
Welcome to Episode 13 of the LLM Fine-Tuning Series — Quantization Part 2! In this video, we move beyond the basics and explore advanced quantization techniques that power today’s large language models (LLMs). If you’ve watched Part 1 (Introduction to Quantization), you’re now ready to dive into real-world algorithms and formats like GPTQ, AWQ, GGML, GGUF, and QAT. What You’ll Learn in This Video 1️⃣ Advanced LLM Quantization Overview – Why scaling models require smarter compression 2️⃣ GPTQ (Gradient Post-Training Quantization) – How it reduces memory while preserving accuracy 3️⃣ AWQ (Activation-Aware Weight Quantization) – Minimizing quantization error with smarter calibration 4️⃣ Quantization-Aware Training (QAT) – Teaching models to learn with quantization noise 5️⃣ GGML (Georgi Gerganov’s Machine Learning Library) – Lightweight inference engine for CPUs & edge devices 6️⃣ GGUF (General Unified Format) – Standardized, future-proof quantization format for LLMs 7️⃣ Hands-on Practical Demo with LLMs – Applying quantization and measuring speed & memory gains 8️⃣ Choosing the Right Technique – When to use GPTQ, AWQ, GGUF, or QAT By the end of this video, you’ll be equipped to deploy massive LLMs on limited hardware with blazing speed and lower costs. Material & Resources: https://github.com/sunnysavita10/Complete-LLM-Finetuning/tree/main/LLM-Quantization-Part-2 🔔 Like, Share & Subscribe to stay updated with the full LLM fine-tuning playlist. Got questions or topic requests? Drop a comment below 👇. ⏱️ Timestamps / Chapters: 00:00 Introduction to LLM Quantization 06:13 Advanced LLM Quantization Concepts 27:19 PTQ vs QAT for Large Language Models 44:52 GPTQ Theory (Gradient Post-Training Quantization) 01:35:09 GPTQ Hands-on Practical Example 02:10:49 AWQ Theory (Activation-Aware Quantization) 02:27:07 AWQ Hands-on Practical Example 02:35:30 QAT (Quantization Aware Training) Theory + Practical 02:56:17 GGML & GGUF Explained for LLM Inference
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

LLM Cost Calculator
Estimate monthly costs for LLM models like Claude, GPT, and Llama using a free cost calculator tool, and understand the importance of cost estimation in AI model selection
Dev.to · Codehelper
How to Run Claude Code Locally (100% Free & Fully Private)
Run Claude code locally for free and private AI development
Medium · LLM
Stop Blaming Claude Opus 4.7. Your Prompts Were Always Broken — 4.6 Was Just Carrying You.
Learn how to craft effective prompts for LLMs like Claude Opus 4.7 and avoid blaming the model for poor results
Medium · LLM
AI Isn’t “Inspired” by Human Writing. It Is Built on Unpaid Intellectual Labor
AI models are built on unpaid intellectual labor, erasing attribution and recombining human knowledge, highlighting ethical concerns in AI development
Dev.to AI

Chapters (9)

Introduction to LLM Quantization
6:13 Advanced LLM Quantization Concepts
27:19 PTQ vs QAT for Large Language Models
44:52 GPTQ Theory (Gradient Post-Training Quantization)
1:35:09 GPTQ Hands-on Practical Example
2:10:49 AWQ Theory (Activation-Aware Quantization)
2:27:07 AWQ Hands-on Practical Example
2:35:30 QAT (Quantization Aware Training) Theory + Practical
2:56:17 GGML & GGUF Explained for LLM Inference
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →