Understanding Model Quantization and Distillation in LLMs

AppliedAI · Intermediate ·🧠 Large Language Models ·1y ago
Learn how model quantization and distillation—two key techniques for large model compression—help reduce costs and improve efficiency when deploying AI models. In this video, we’ll explore: Why compress models?: The high costs of deploying large models and the need for optimization. What is quantization?: Reducing model size by lowering parameter precision (e.g., from float32 to float16 or int8) to save storage and speed up inference. What is distillation?: Training a smaller “student” model to mimic the behavior of a larger “teacher” model, achieving similar performance with less computatio…
Watch on YouTube ↗ (saves to browser)
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)