Reverse-engineering GGUF | Post-Training Quantization
Skills:
ML Maths Basics60%
Key Takeaways
This video explains Post-Training Quantization using the GGUF quantization ecosystem
Original Description
The first comprehensive explainer for the GGUF quantization ecosystem.
GGUF quantization is currently the most popular tool for Post-Training Quantization. GGUF is actually a binary file format for quantized models, sitting on top of GGML (a lean PyTorch alternative) and llama.cpp (an LLM inference engine).
Due to its ad-hoc open-source nature, GGUF is poorly documented and misunderstood. Currently, information is scattered across Reddit threads and GitHub pull requests.
📌 Main topics covered in this video:
- The ecosystem: GGML, llama.cpp, GGUF
- Legacy quants vs K-quants vs I-quants
- The importance matrix
- Mixed precision (_S, _M, _L, _XL variants)
If you enjoyed this video, watch my entire series on model quantization: https://www.youtube.com/playlist?list=PL4bm2lr9UVG0HvePBXvsceO4yuLC8HhUh
📬 Have feedback or spotted an error? Contribute to the GitHub repo or leave a comment!
https://github.com/iuliaturc/gguf-docs
00:00 Intro
01:36 The stack: GGML, llama.cpp, GGUF
04:05 End-to-end workflow
05:29 Overview: Legacy, K-quants, I-quants
06:03 Legacy quants (Type 0, Type1)
10:57 K-quants
13:43 I-quants
17:42 Importance Matrix
22:51 Recap
23:35 Mixed precision (_S, _M, _L, _XL)
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Debugging Benchmark: DeepSeek V4 Pro vs MiMo V2.5 Pro
Dev.to · Stanislav
How I'm re-discovering computer science with LLM revolution
Dev.to · popiol
I Asked ChatGPT to Fix My Life. It Couldn’t — Until I Changed One Thing
Medium · AI
I Asked ChatGPT to Fix My Life. It Couldn’t — Until I Changed One Thing
Medium · ChatGPT
Chapters (10)
Intro
1:36
The stack: GGML, llama.cpp, GGUF
4:05
End-to-end workflow
5:29
Overview: Legacy, K-quants, I-quants
6:03
Legacy quants (Type 0, Type1)
10:57
K-quants
13:43
I-quants
17:42
Importance Matrix
22:51
Recap
23:35
Mixed precision (_S, _M, _L, _XL)
🎓
Tutor Explanation
DeepCamp AI