No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning

📰 ArXiv cs.AI

arXiv:2601.06794v2 Announce Type: replace Abstract: Critique-guided reinforcement learning (RL) has emerged as a powerful paradigm for training LLM agents by augmenting sparse outcome rewards with natural-language feedback. However, current methods often rely on static or offline critic models, which fail to adapt as the policy evolves. In on-policy RL, the agent's error patterns shift over time, causing stationary critics to become stale and providing feedback of diminishing utility. To address

Published 15 Apr 2026
Read full paper → ← Back to Reads