Reverse-engineering GGUF | Post-Training Quantization

Julia Turc · Beginner ·🧠 Large Language Models ·11mo ago

Skills: ML Maths Basics60%

Key Takeaways

This video explains Post-Training Quantization using the GGUF quantization ecosystem

Original Description

The first comprehensive explainer for the GGUF quantization ecosystem. GGUF quantization is currently the most popular tool for Post-Training Quantization. GGUF is actually a binary file format for quantized models, sitting on top of GGML (a lean PyTorch alternative) and llama.cpp (an LLM inference engine). Due to its ad-hoc open-source nature, GGUF is poorly documented and misunderstood. Currently, information is scattered across Reddit threads and GitHub pull requests. 📌 Main topics covered in this video: - The ecosystem: GGML, llama.cpp, GGUF - Legacy quants vs K-quants vs I-quants - The importance matrix - Mixed precision (_S, _M, _L, _XL variants) If you enjoyed this video, watch my entire series on model quantization: https://www.youtube.com/playlist?list=PL4bm2lr9UVG0HvePBXvsceO4yuLC8HhUh 📬 Have feedback or spotted an error? Contribute to the GitHub repo or leave a comment! https://github.com/iuliaturc/gguf-docs 00:00 Intro 01:36 The stack: GGML, llama.cpp, GGUF 04:05 End-to-end workflow 05:29 Overview: Legacy, K-quants, I-quants 06:03 Legacy quants (Type 0, Type1) 10:57 K-quants 13:43 I-quants 17:42 Importance Matrix 22:51 Recap 23:35 Mixed precision (_S, _M, _L, _XL)

Watch on YouTube ↗ (saves to browser)

Sign in to unlock AI tutor explanation · ⚡30

More on: ML Maths Basics

View skill →

Coding the GARCH Model : Time Series Talk

Coding the GARCH Model : Time Series Talk

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Important Steps I Have Followed To Improve My Data Science Skills- Sharing My Experience

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

Learn Python FAST for Beginners 🚀#coding #conditionals #loops #functions

ChethanAIChronicles

“Hello, world” from scratch on a 6502 — Part 1

“Hello, world” from scratch on a 6502 — Part 1

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

PCA (Principal Component Analysis) in Python - Machine Learning From Scratch 11 - Python Tutorial

ROC and AUC in R

ROC and AUC in R

StatQuest with Josh Starmer

Related AI Lessons

Debugging Benchmark: DeepSeek V4 Pro vs MiMo V2.5 Pro

Compare the debugging capabilities of DeepSeek V4 Pro and MiMo V2.5 Pro on a real-world GitHub bug

Dev.to · Stanislav

How I'm re-discovering computer science with LLM revolution

Reinvigorate your computer science knowledge with the LLM revolution and discover new applications and techniques

Dev.to · popiol

I Asked ChatGPT to Fix My Life. It Couldn’t — Until I Changed One Thing

Learn how to effectively use AI like ChatGPT to improve your life by changing your approach

I Asked ChatGPT to Fix My Life. It Couldn’t — Until I Changed One Thing

Learn how to effectively use ChatGPT to solve personal problems by changing your approach

Medium · ChatGPT

Chapters (10)

Intro

1:36 The stack: GGML, llama.cpp, GGUF

4:05 End-to-end workflow

5:29 Overview: Legacy, K-quants, I-quants

6:03 Legacy quants (Type 0, Type1)

10:57 K-quants

13:43 I-quants

17:42 Importance Matrix

22:51 Recap

23:35 Mixed precision (_S, _M, _L, _XL)

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)