This Simple Optimizer Is Revolutionizing How We Train AI [Muon]

Name: This Simple Optimizer Is Revolutionizing How We Train AI [Muon]
Uploaded: 2025-10-14T01:56:38+00:00
Channel: Jia-Bin Huang
Description: The Muon optimizer has demonstrated remarkable performance in accelerating machine learning model training, often outperforming the widely used AdamW op...

Jia-Bin Huang · Beginner ·🧠 Large Language Models ·5mo ago

The Muon optimizer has demonstrated remarkable performance in accelerating machine learning model training, often outperforming the widely used AdamW optimizer. In this video, we will cover the basic concept of how Muon works and discuss some recent improvements that make it scalable for large-scale LLM training. 00:00 Why Muon? 00:36 Reviewing Adam 02:13 Linear layer 04:24 Solving orthogonalization with SVD 06:28 Newton-Schulz iteration - Odd polynomial matrix 08:11 Newton-Schulz iteration - Example 10:35 The Muon optimizer 11:49 The exploding attention logit crisis 15:13 MuonClip: Extending…

Watch on YouTube ↗ (saves to browser)