English is Not All You Need: Systematically Exploring the Role of Multilinguality in LLM Post-Training
📰 ArXiv cs.AI
arXiv:2604.13286v1 Announce Type: cross Abstract: Despite the widespread multilingual deployment of large language models, post-training pipelines remain predominantly English-centric, contributing to performance disparities across languages. We present a systematic, controlled study of the interplay between training language coverage, model scale, and task domain, based on 220 supervised fine-tuning runs on parallel translated multilingual data mixtures spanning mathematical reasoning and API c
DeepCamp AI