Reverse-engineering GGUF | Post-Training Quantization
Skills:
ML Maths Basics60%
The first comprehensive explainer for the GGUF quantization ecosystem.
GGUF quantization is currently the most popular tool for Post-Training Quantization. GGUF is actually a binary file format for quantized models, sitting on top of GGML (a lean PyTorch alternative) and llama.cpp (an LLM inference engine).
Due to its ad-hoc open-source nature, GGUF is poorly documented and misunderstood. Currently, information is scattered across Reddit threads and GitHub pull requests.
📌 Main topics covered in this video:
- The ecosystem: GGML, llama.cpp, GGUF
- Legacy quants vs K-quants vs I-quants
- The importance matrix
- Mixed precision (_S, _M, _L, _XL variants)
If you enjoyed this video, watch my entire series on model quantization: https://www.youtube.com/playlist?list=PL4bm2lr9UVG0HvePBXvsceO4yuLC8HhUh
📬 Have feedback or spotted an error? Contribute to the GitHub repo or leave a comment!
https://github.com/iuliaturc/gguf-docs
00:00 Intro
01:36 The stack: GGML, llama.cpp, GGUF
04:05 End-to-end workflow
05:29 Overview: Legacy, K-quants, I-quants
06:03 Legacy quants (Type 0, Type1)
10:57 K-quants
13:43 I-quants
17:42 Importance Matrix
22:51 Recap
23:35 Mixed precision (_S, _M, _L, _XL)
Watch on YouTube ↗
(saves to browser)
Sign in to unlock AI tutor explanation · ⚡30
More on: ML Maths Basics
View skill →Related AI Lessons
⚡
⚡
⚡
⚡
Nobody Knows What The Beach Is Saying. And That’s The Point.
Medium · Deep Learning
EEG Motor Imagery: Using Brain Signals to Predict Movement Intention
Medium · Machine Learning
Visualizing Why Standardization Changes Decision Boundaries
Dev.to · hqqqqy
Building Shruthi Bandhu: How We Engineered an AI Gesture Tool for the Deaf-Mute Community (And Won the Vishwakarma Awards)
Dev.to · SHAIK TAUFEEQ AHMAD
Chapters (10)
Intro
1:36
The stack: GGML, llama.cpp, GGUF
4:05
End-to-end workflow
5:29
Overview: Legacy, K-quants, I-quants
6:03
Legacy quants (Type 0, Type1)
10:57
K-quants
13:43
I-quants
17:42
Importance Matrix
22:51
Recap
23:35
Mixed precision (_S, _M, _L, _XL)
🎓
Tutor Explanation
DeepCamp AI