Gefen: Optimized Stochastic Optimizer

📰 ArXiv cs.AI

arXiv:2606.13894v1 Announce Type: cross Abstract: AdamW is a default optimizer for modern deep learning, but its first and second moment states add roughly two parameter-sized buffers to training memory. We propose Gefen, a memory-efficient optimizer that automatically shares second-moment estimates across parameter blocks and quantizes the first moment using a learned codebook, thereby reducing AdamW's memory footprint by ~8x while maintaining the same performance, corresponding to a reduction

Published 15 Jun 2026

Read full paper → ← Back to Reads