WARC-Bench: Web Archive Based Benchmark for GUI Subtask Executions
📰 ArXiv cs.AI
arXiv:2510.09872v2 Announce Type: replace-cross Abstract: Training web agents to navigate complex, real-world websites requires them to master $\textit{subtasks}$ - short-horizon interactions on multiple UI components (e.g., choosing the correct date in a date picker, or scrolling in a container to extract information). We introduce WARC-Bench (Web Archive Benchmark), a novel web navigation benchmark featuring 438 tasks designed to evaluate multimodal AI agents on subtasks. WARC-Bench enables sa
DeepCamp AI