BIG Mistake in Adam | Adam vs AdamW

Name: BIG Mistake in Adam | Adam vs AdamW
Uploaded: 2026-03-08T03:59:56+00:00
Channel: Build AI with Sandeep
Description: In this video we clearly explain the difference between Adam optimizer and AdamW optimizer used in deep learning and machine learning. Many people use A...

Build AI with Sandeep · Beginner ·🧠 Large Language Models ·3w ago

In this video we clearly explain the difference between Adam optimizer and AdamW optimizer used in deep learning and machine learning. Many people use Adam without understanding how weight decay and L2 regularization behave inside adaptive optimizers. This video explains: • Why momentum uses mean of gradients • Why RMSProp uses squared gradients • What weight decay actually means • How L2 regularization changes the gradient • Why Adam mixes weight decay incorrectly • How AdamW fixes the problem with decoupled weight decay This topic is important for anyone working in: Deep Learning Machin…

Watch on YouTube ↗ (saves to browser)

Next Up

5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems

Dave Ebbelaar (LLM Eng)