GAIA-v2-LILT: Multilingual Adaptation of Agent Benchmark beyond Translation

📰 ArXiv cs.AI

arXiv:2604.24929v1 Announce Type: cross Abstract: Agent benchmarks remain largely English-centric, while their multilingual versions are often built with machine translation (MT) and limited post-editing. We argue that, for agentic tasks, this minimal workflow can easily break benchmark validity through query-answer misalignment or culturally off-target context. We propose a refined workflow for adapting English benchmarks into multiple languages with explicit functional alignment, cultural alig

Published 29 Apr 2026
Read full paper → ← Back to Reads