Preservation Is Not Enough for Width Growth: Regime-Sensitive Selection of Dense LM Warm Starts

📰 ArXiv cs.AI

Width expansion for language models requires regime-sensitive selection of dense LM warm starts beyond zero-step preservation

advanced Published 7 Apr 2026
Action Steps
  1. Study the problem of width expansion as a candidate-selection problem over full training states
  2. Compare different warm start strategies, including exact-copy, perturbative, asymmetric-reset, and structured non-clone methods
  3. Evaluate the effectiveness of each strategy in a small-scale proxy, such as TinyStories
  4. Select the optimal warm start strategy based on the evaluation results and apply it to larger-scale language model training
Who Needs to Know This

ML researchers and engineers working on language model development and fine-tuning can benefit from this research, as it provides insights into improving width growth and selecting optimal warm starts

Key Insight

💡 Regime-sensitive selection of dense LM warm starts is crucial for effective width growth

Share This
💡 Width expansion for LMs requires more than just preservation #LLMs #WidthGrowth
Read full paper → ← Back to Reads