OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training

📰 ArXiv cs.AI

OptiMer proposes a new method for continual pre-training of LLMs by merging distribution vectors, outperforming data mixing approaches

advanced Published 1 Apr 2026

Action Steps

Train a separate CPT model for each dataset
Extract the distribution vector from each model
Merge the distribution vectors using OptiMer
Fine-tune the merged model for the target task

Who Needs to Know This

ML researchers and engineers working on LLMs and continual pre-training can benefit from OptiMer, as it simplifies the process of adapting models to target languages and domains

Key Insight

💡 Decoupling ratio selection from training can improve the efficiency and effectiveness of continual pre-training