Preservation Is Not Enough for Width Growth: Regime-Sensitive Selection of Dense LM Warm Starts

📰 ArXiv cs.AI

Width expansion for language models requires regime-sensitive selection of dense LM warm starts beyond zero-step preservation

advanced Published 7 Apr 2026

Action Steps

Study the problem of width expansion as a candidate-selection problem over full training states
Compare different warm start strategies, including exact-copy, perturbative, asymmetric-reset, and structured non-clone methods
Evaluate the effectiveness of each strategy in a small-scale proxy, such as TinyStories
Select the optimal warm start strategy based on the evaluation results and apply it to larger-scale language model training

Who Needs to Know This

ML researchers and engineers working on language model development and fine-tuning can benefit from this research, as it provides insights into improving width growth and selecting optimal warm starts

Key Insight

💡 Regime-sensitive selection of dense LM warm starts is crucial for effective width growth