MPR-GUI: Benchmarking and Enhancing Multilingual Perception and Reasoning in GUI Agents

📰 ArXiv cs.AI

arXiv:2512.00756v2 Announce Type: replace Abstract: Large Vision-Language Models (LVLMs) have shown strong potential as multilingual Graphical User Interface (GUI) agents, as evidenced by existing GUI benchmarks. However, these benchmarks exhibit two primary limitations: (1) although Perception and Reasoning (P&R) capabilities are fundamental for GUI agents, current benchmarks lack fine-grained diagnostics to identify which specific capabilities lead to task failures, hindering targeted improvem

Published 29 Apr 2026
Read full paper → ← Back to Reads