MPR-GUI: Benchmarking and Enhancing Multilingual Perception and Reasoning in GUI Agents
📰 ArXiv cs.AI
arXiv:2512.00756v2 Announce Type: replace Abstract: Large Vision-Language Models (LVLMs) have shown strong potential as multilingual Graphical User Interface (GUI) agents, as evidenced by existing GUI benchmarks. However, these benchmarks exhibit two primary limitations: (1) although Perception and Reasoning (P&R) capabilities are fundamental for GUI agents, current benchmarks lack fine-grained diagnostics to identify which specific capabilities lead to task failures, hindering targeted improvem
DeepCamp AI