WARC-Bench: Web Archive Based Benchmark for GUI Subtask Executions

📰 ArXiv cs.AI

arXiv:2510.09872v2 Announce Type: replace-cross Abstract: Training web agents to navigate complex, real-world websites requires them to master $\textit{subtasks}$ - short-horizon interactions on multiple UI components (e.g., choosing the correct date in a date picker, or scrolling in a container to extract information). We introduce WARC-Bench (Web Archive Benchmark), a novel web navigation benchmark featuring 438 tasks designed to evaluate multimodal AI agents on subtasks. WARC-Bench enables sa

Published 20 May 2026

Read full paper → ← Back to Reads