MGA: Memory-Driven GUI Agent for Observation-Centric Interaction

📰 ArXiv cs.AI

arXiv:2510.24168v2 Announce Type: replace Abstract: Multimodal Large Language Models (MLLMs) have significantly advanced GUI agents, yet long-horizon automation remains constrained by two critical bottlenecks: context overload from raw sequential trajectory dependence and architectural redundancy from over-engineered expert modules. Prevailing End-to-End and Multi-Agent paradigms struggle with error cascades caused by concatenated visual-textual histories and incur high inference latency due to

Published 14 Apr 2026
Read full paper → ← Back to Reads