Reverse-engineering GGUF | Post-Training Quantization

Julia Turc · Beginner ·📐 ML Fundamentals ·8mo ago
The first comprehensive explainer for the GGUF quantization ecosystem. GGUF quantization is currently the most popular tool for Post-Training Quantization. GGUF is actually a binary file format for quantized models, sitting on top of GGML (a lean PyTorch alternative) and llama.cpp (an LLM inference engine). Due to its ad-hoc open-source nature, GGUF is poorly documented and misunderstood. Currently, information is scattered across Reddit threads and GitHub pull requests. 📌 Main topics covered in this video: - The ecosystem: GGML, llama.cpp, GGUF - Legacy quants vs K-quants vs I-quants - Th…
Watch on YouTube ↗ (saves to browser)

Chapters (10)

Intro
1:36 The stack: GGML, llama.cpp, GGUF
4:05 End-to-end workflow
5:29 Overview: Legacy, K-quants, I-quants
6:03 Legacy quants (Type 0, Type1)
10:57 K-quants
13:43 I-quants
17:42 Importance Matrix
22:51 Recap
23:35 Mixed precision (_S, _M, _L, _XL)
XState: Let's pair program state machines and state charts with David Khourshid
Next Up
XState: Let's pair program state machines and state charts with David Khourshid
Fun Fun Function