Towards Long-horizon Agentic Multimodal Search

📰 ArXiv cs.AI

arXiv:2604.12890v1 Announce Type: cross Abstract: Multimodal deep search agents have shown great potential in solving complex tasks by iteratively collecting textual and visual evidence. However, managing the heterogeneous information and high token costs associated with multimodal inputs over long horizons remains a critical challenge, as existing methods often suffer from context explosion or the loss of crucial visual signals. To address this, we propose a novel Long-horizon MultiModal deep s

Published 15 Apr 2026
Read full paper → ← Back to Reads