A Functionality-Grounded Benchmark for Evaluating Web Agents in E-commerce Domains

📰 ArXiv cs.AI

arXiv:2508.15832v2 Announce Type: replace-cross Abstract: Web agents have shown great promise in performing many tasks on ecommerce website. To assess their capabilities, several benchmarks have been introduced. However, current benchmarks in the e-commerce domain face two major problems. First, they primarily focus on product search tasks (e.g., Find an Apple Watch), failing to capture the broader range of functionalities offered by real-world e-commerce platforms such as Amazon, including acco

Published 22 Apr 2026

Read full paper → ← Back to Reads