Second-Order Multi-Level Variance Correction for Modality Competition in Multimodal Models
📰 ArXiv cs.AI
arXiv:2605.16165v1 Announce Type: cross Abstract: Autoregressive next-token training offers a unified formulation for image generation and text understanding, but it also creates strong modality competition that destabilizes optimization and limits large-batch scaling. We show that first-order optimizers such as AdamW are vulnerable to cross-modality gradient heterogeneity, while second-order preconditioning, particularly SOAP, provides a more stable basis for multimodal alignment. Building on t
DeepCamp AI