Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

📰 ArXiv cs.AI

Learn why LLM-based web agents fail and how to evaluate them using a hierarchical planning framework

advanced Published 29 Apr 2026

Action Steps

Analyze web agent failures using a hierarchical planning framework
Evaluate high-level planning in LLM-based web agents
Assess low-level execution in web agents
Implement replanning mechanisms to handle failures
Apply process-based evaluation to identify areas for improvement

Who Needs to Know This

AI engineers and researchers can benefit from understanding the limitations of LLM-based web agents and how to improve their reliability using hierarchical planning

Key Insight

💡 Hierarchical planning can help identify and address failures in LLM-based web agents

Key Takeaways

Learn why LLM-based web agents fail and how to evaluate them using a hierarchical planning framework

Full Article

Title: Why Do LLM-based Web Agents Fail? A Hierarchical Planning Perspective

Abstract:
arXiv:2603.14248v2 Announce Type: replace Abstract: Large language model (LLM) web agents are increasingly used for web navigation but remain far from human reliability on realistic, long-horizon tasks. Existing evaluations focus primarily on end-to-end success, offering limited insight into where failures arise. We propose a hierarchical planning framework to analyze web agents across three layers (i.e., high-level planning, low-level execution, and replanning), enabling process-based evaluatio

Read full paper → ← Back to Reads