From SGD to LAMB: A Deep Engineering Walkthrough of Modern Optimizers
📰 Medium · Deep Learning
Why we moved from “just gradient descent” to layer-wise trust ratios — and what each step actually fixes. Continue reading on Medium »
Why we moved from “just gradient descent” to layer-wise trust ratios — and what each step actually fixes. Continue reading on Medium »