BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search

📰 ArXiv cs.AI

arXiv:2601.11037v2 Announce Type: replace Abstract: RL-based agentic search enables LLMs to solve complex questions via dynamic planning and external search. While this approach significantly enhances accuracy with agent policies optimized via large-scale reinforcement learning, we identify a critical gap in reliability: these agents fail to recognize their reasoning boundaries and rarely admit ``I DON'T KNOW'' (IDK) even when evidence is insufficient or reasoning reaches its limit. The lack of

Published 22 Apr 2026

Read full paper → ← Back to Reads