BIG Mistake in Adam | Adam vs AdamW

Build AI with Sandeep · Beginner ·🧠 Large Language Models ·3w ago
In this video we clearly explain the difference between Adam optimizer and AdamW optimizer used in deep learning and machine learning. Many people use Adam without understanding how weight decay and L2 regularization behave inside adaptive optimizers. This video explains: • Why momentum uses mean of gradients • Why RMSProp uses squared gradients • What weight decay actually means • How L2 regularization changes the gradient • Why Adam mixes weight decay incorrectly • How AdamW fixes the problem with decoupled weight decay This topic is important for anyone working in: Deep Learning Machin…
Watch on YouTube ↗ (saves to browser)
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)