Preservation Is Not Enough for Width Growth: Regime-Sensitive Selection of Dense LM Warm Starts

📰 ArXiv cs.AI

arXiv:2604.04281v1 Announce Type: new Abstract: Width expansion offers a practical route to reuse smaller causal-language-model checkpoints, but selecting a widened warm start is not solved by zero-step preservation alone. We study dense width growth as a candidate-selection problem over full training states, including copied weights, optimizer moments, and scheduler state. In a small-scale TinyStories proxy, we compare exact-copy, perturbative, asymmetric-reset, and structured non-clone warm st

Published 7 Apr 2026
Read full paper → ← Back to News