Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective
📰 ArXiv cs.AI
arXiv:2603.14248v2 Announce Type: replace Abstract: Large language model (LLM) web agents are increasingly used for web navigation but remain far from human reliability on realistic, long-horizon tasks. Existing evaluations focus primarily on end-to-end success, offering limited insight into where failures arise. We propose a hierarchical planning framework to analyze web agents across three layers (i.e., high-level planning, low-level execution, and replanning), enabling process-based evaluatio
DeepCamp AI