OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training

📰 ArXiv cs.AI

OptiMer proposes a new method for continual pre-training of LLMs by merging distribution vectors, outperforming data mixing approaches

advanced Published 1 Apr 2026
Action Steps
  1. Train a separate CPT model for each dataset
  2. Extract the distribution vector from each model
  3. Merge the distribution vectors using OptiMer
  4. Fine-tune the merged model for the target task
Who Needs to Know This

ML researchers and engineers working on LLMs and continual pre-training can benefit from OptiMer, as it simplifies the process of adapting models to target languages and domains

Key Insight

💡 Decoupling ratio selection from training can improve the efficiency and effectiveness of continual pre-training

Share This
🚀 OptiMer: a new approach to continual pre-training for LLMs, merging distribution vectors for better performance
Read full paper → ← Back to News