CulturALL: Benchmarking Multilingual and Multicultural Competence of LLMs on Grounded Tasks
📰 ArXiv cs.AI
arXiv:2604.19262v1 Announce Type: cross Abstract: Large language models (LLMs) are now deployed worldwide, inspiring a surge of benchmarks that measure their multilingual and multicultural abilities. However, these benchmarks prioritize generic language understanding or superficial cultural trivia, leaving the evaluation of grounded tasks -- where models must reason within real-world, context-rich scenarios -- largely unaddressed. To fill this gap, we present CulturALL, a comprehensive and chall
DeepCamp AI