LLM Fine-Tuning 13: LLM Quantization Explained (PART 2) | PTQ, QAT, GPTQ, AWQ, GGUF, GGML, llama.cpp

Sunny Savita · Beginner ·🧠 Large Language Models ·7mo ago

Skills: Fine-tuning LLMs80%ML Maths Basics60%

Welcome to Episode 13 of the LLM Fine-Tuning Series — Quantization Part 2! In this video, we move beyond the basics and explore advanced quantization techniques that power today’s large language models (LLMs). If you’ve watched Part 1 (Introduction to Quantization), you’re now ready to dive into real-world algorithms and formats like GPTQ, AWQ, GGML, GGUF, and QAT. What You’ll Learn in This Video 1️⃣ Advanced LLM Quantization Overview – Why scaling models require smarter compression 2️⃣ GPTQ (Gradient Post-Training Quantization) – How it reduces memory while preserving accuracy 3️⃣ AWQ (Activation-Aware Weight Quantization) – Minimizing quantization error with smarter calibration 4️⃣ Quantization-Aware Training (QAT) – Teaching models to learn with quantization noise 5️⃣ GGML (Georgi Gerganov’s Machine Learning Library) – Lightweight inference engine for CPUs & edge devices 6️⃣ GGUF (General Unified Format) – Standardized, future-proof quantization format for LLMs 7️⃣ Hands-on Practical Demo with LLMs – Applying quantization and measuring speed & memory gains 8️⃣ Choosing the Right Technique – When to use GPTQ, AWQ, GGUF, or QAT By the end of this video, you’ll be equipped to deploy massive LLMs on limited hardware with blazing speed and lower costs. Material & Resources: https://github.com/sunnysavita10/Complete-LLM-Finetuning/tree/main/LLM-Quantization-Part-2 🔔 Like, Share & Subscribe to stay updated with the full LLM fine-tuning playlist. Got questions or topic requests? Drop a comment below 👇. ⏱️ Timestamps / Chapters: 00:00 Introduction to LLM Quantization 06:13 Advanced LLM Quantization Concepts 27:19 PTQ vs QAT for Large Language Models 44:52 GPTQ Theory (Gradient Post-Training Quantization) 01:35:09 GPTQ Hands-on Practical Example 02:10:49 AWQ Theory (Activation-Aware Quantization) 02:27:07 AWQ Hands-on Practical Example 02:35:30 QAT (Quantization Aware Training) Theory + Practical 02:56:17 GGML & GGUF Explained for LLM Inference

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: Fine-tuning LLMs

View skill →

Fine-tuning T5 LLM for Text Generation: Complete Tutorial w/ free COLAB #coding

Fine-tuning T5 LLM for Text Generation: Complete Tutorial w/ free COLAB #coding

Train image classifier using transfer learning - Fine-tuning MobileNet with Keras

Train image classifier using transfer learning - Fine-tuning MobileNet with Keras

Advanced Fine-Tuning in Rust

Advanced Fine-Tuning in Rust

GPT-4o: Fine-tune OpenAI's Multimodal Model | Live Coding & Q&A (Oct 3rd)

GPT-4o: Fine-tune OpenAI's Multimodal Model | Live Coding & Q&A (Oct 3rd)

LLM Fine-tuning: Two Crucial Tips for New Models - LLama 2

LLM Fine-tuning: Two Crucial Tips for New Models - LLama 2

SDXL LORA STYLE Training! Get THE PERFECT RESULTS!

SDXL LORA STYLE Training! Get THE PERFECT RESULTS!

Related AI Lessons

LLM Cost Calculator

Estimate monthly costs for LLM models like Claude, GPT, and Llama using a free cost calculator tool, and understand the importance of cost estimation in AI model selection

Dev.to · Codehelper

How to Run Claude Code Locally (100% Free & Fully Private)

Run Claude code locally for free and private AI development

Stop Blaming Claude Opus 4.7. Your Prompts Were Always Broken — 4.6 Was Just Carrying You.

Learn how to craft effective prompts for LLMs like Claude Opus 4.7 and avoid blaming the model for poor results

AI Isn’t “Inspired” by Human Writing. It Is Built on Unpaid Intellectual Labor

AI models are built on unpaid intellectual labor, erasing attribution and recombining human knowledge, highlighting ethical concerns in AI development

Chapters (9)

Introduction to LLM Quantization

6:13 Advanced LLM Quantization Concepts

27:19 PTQ vs QAT for Large Language Models

44:52 GPTQ Theory (Gradient Post-Training Quantization)

1:35:09 GPTQ Hands-on Practical Example

2:10:49 AWQ Theory (Activation-Aware Quantization)

2:27:07 AWQ Hands-on Practical Example

2:35:30 QAT (Quantization Aware Training) Theory + Practical

2:56:17 GGML & GGUF Explained for LLM Inference

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)