MGSM-Pro: A Simple Strategy for Robust Multilingual Mathematical Reasoning Evaluation

📰 ArXiv cs.AI

arXiv:2601.21225v2 Announce Type: replace-cross Abstract: Large language models have made substantial progress in mathematical reasoning. However, benchmark development for multilingual evaluation has lagged behind English in both difficulty and recency. Recently, GSM-Symbolic showed a strong evidence of high variance when models are evaluated on different instantiations of the same question; however, the evaluation was conducted only in English. In this paper, we introduce MGSM-Pro, an extensio

Published 29 Apr 2026
Read full paper → ← Back to Reads