Reverse-engineering GGUF | Post-Training Quantization

Julia Turc · Beginner ·🧠 Large Language Models ·11mo ago

Key Takeaways

This video explains Post-Training Quantization using the GGUF quantization ecosystem

Original Description

The first comprehensive explainer for the GGUF quantization ecosystem. GGUF quantization is currently the most popular tool for Post-Training Quantization. GGUF is actually a binary file format for quantized models, sitting on top of GGML (a lean PyTorch alternative) and llama.cpp (an LLM inference engine). Due to its ad-hoc open-source nature, GGUF is poorly documented and misunderstood. Currently, information is scattered across Reddit threads and GitHub pull requests. 📌 Main topics covered in this video: - The ecosystem: GGML, llama.cpp, GGUF - Legacy quants vs K-quants vs I-quants - The importance matrix - Mixed precision (_S, _M, _L, _XL variants) If you enjoyed this video, watch my entire series on model quantization: https://www.youtube.com/playlist?list=PL4bm2lr9UVG0HvePBXvsceO4yuLC8HhUh 📬 Have feedback or spotted an error? Contribute to the GitHub repo or leave a comment! https://github.com/iuliaturc/gguf-docs 00:00 Intro 01:36 The stack: GGML, llama.cpp, GGUF 04:05 End-to-end workflow 05:29 Overview: Legacy, K-quants, I-quants 06:03 Legacy quants (Type 0, Type1) 10:57 K-quants 13:43 I-quants 17:42 Importance Matrix 22:51 Recap 23:35 Mixed precision (_S, _M, _L, _XL)
Watch on YouTube ↗ (saves to browser)
Sign in to unlock AI tutor explanation · ⚡30

Related AI Lessons

Chapters (10)

Intro
1:36 The stack: GGML, llama.cpp, GGUF
4:05 End-to-end workflow
5:29 Overview: Legacy, K-quants, I-quants
6:03 Legacy quants (Type 0, Type1)
10:57 K-quants
13:43 I-quants
17:42 Importance Matrix
22:51 Recap
23:35 Mixed precision (_S, _M, _L, _XL)
Up next
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)
Watch →