Reverse-engineering GGUF | Post-Training Quantization
The first comprehensive explainer for the GGUF quantization ecosystem.
GGUF quantization is currently the most popular tool for Post-Training Quantization. GGUF is actually a binary file format for quantized models, sitting on top of GGML (a lean PyTorch alternative) and llama.cpp (an LLM inference engine).
Due to its ad-hoc open-source nature, GGUF is poorly documented and misunderstood. Currently, information is scattered across Reddit threads and GitHub pull requests.
📌 Main topics covered in this video:
- The ecosystem: GGML, llama.cpp, GGUF
- Legacy quants vs K-quants vs I-quants
- Th…
Watch on YouTube ↗
(saves to browser)
Chapters (10)
Intro
1:36
The stack: GGML, llama.cpp, GGUF
4:05
End-to-end workflow
5:29
Overview: Legacy, K-quants, I-quants
6:03
Legacy quants (Type 0, Type1)
10:57
K-quants
13:43
I-quants
17:42
Importance Matrix
22:51
Recap
23:35
Mixed precision (_S, _M, _L, _XL)
DeepCamp AI