OptiMer: Optimal Distribution Vector Merging Is Better than Data Mixing for Continual Pre-Training
📰 ArXiv cs.AI
OptiMer proposes a new method for continual pre-training of LLMs by merging distribution vectors, outperforming data mixing approaches
Action Steps
- Train a separate CPT model for each dataset
- Extract the distribution vector from each model
- Merge the distribution vectors using OptiMer
- Fine-tune the merged model for the target task
Who Needs to Know This
ML researchers and engineers working on LLMs and continual pre-training can benefit from OptiMer, as it simplifies the process of adapting models to target languages and domains
Key Insight
💡 Decoupling ratio selection from training can improve the efficiency and effectiveness of continual pre-training
Share This
🚀 OptiMer: a new approach to continual pre-training for LLMs, merging distribution vectors for better performance
DeepCamp AI