Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)

Name: Which Quantization Method is Right for You? (GPTQ vs. GGUF vs. AWQ)
Uploaded: 2023-11-23T15:00:45+00:00
Channel: Maarten Grootendorst
Description: In this tutorial, we will explore many different methods for loading in pre-quantized models, such as Zephyr 7B. We will explore the three common method...

Maarten Grootendorst · Beginner ·🧠 Large Language Models ·2y ago

In this tutorial, we will explore many different methods for loading in pre-quantized models, such as Zephyr 7B. We will explore the three common methods for quantization, GPTQ, GGUF (formerly GGML), and AWQ. Timeline 0:00 Introduction 0:25 Loading Zephyr 7B 3:25 Quantization 7:42 Pre-quantized LLMs 8:42 GPTQ 10:29 GGUF 12:22 AWQ 14:46 Outro 📒 Google Colab notebook https://colab.research.google.com/drive/1rt318Ew-5dDw21YZx2zK2vnxbsuDAchH?usp=sharing 🛠️ Written version of this tutorial https://maartengrootendorst.substack.com/p/which-quantization-method-is-right 🤗 Zephyr 7B on HuggingFace …

Watch on YouTube ↗ (saves to browser)