Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

📰 ArXiv cs.AI

arXiv:2603.14248v2 Announce Type: replace Abstract: Large language model (LLM) web agents are increasingly used for web navigation but remain far from human reliability on realistic, long-horizon tasks. Existing evaluations focus primarily on end-to-end success, offering limited insight into where failures arise. We propose a hierarchical planning framework to analyze web agents across three layers (i.e., high-level planning, low-level execution, and replanning), enabling process-based evaluatio

Published 29 Apr 2026
Read full paper → ← Back to Reads