Singularity-aware Optimization via Randomized Geometric Probing: Towards Stable Non-smooth Optimization
📰 ArXiv cs.AI
arXiv:2605.29547v1 Announce Type: cross Abstract: Deep learning optimization relies heavily on the assumption of smooth loss landscapes, a condition systematically violated by modern architectures due to non-smooth components such as ReLU activations and quantization operators. In such non-smooth regimes, adaptive optimizers such as Adam suffer from gradient chattering, violent oscillations caused by conflicting signals within the Clarke subdifferential, leading to poor convergence and suboptima
DeepCamp AI