No More Stale Feedback: Co-Evolving Critics for Open-World Agent Learning
📰 ArXiv cs.AI
arXiv:2601.06794v2 Announce Type: replace Abstract: Critique-guided reinforcement learning (RL) has emerged as a powerful paradigm for training LLM agents by augmenting sparse outcome rewards with natural-language feedback. However, current methods often rely on static or offline critic models, which fail to adapt as the policy evolves. In on-policy RL, the agent's error patterns shift over time, causing stationary critics to become stale and providing feedback of diminishing utility. To address
DeepCamp AI