Behavior-Consistent Deep Reinforcement Learning

📰 ArXiv cs.AI

arXiv:2605.21214v2 Announce Type: cross Abstract: Reinforcement learning (RL) often exhibits high variance across training runs, leading to unreliable performance and posing a major challenge to deployment in real-world domains. In this work, we address the challenge of cross-run policy divergence by formalizing the problem of behavior-consistent RL, where the objective is to obtain policies that are both high-performing and distributionally similar across training runs. Our key observation is t

Published 21 May 2026

Read full paper → ← Back to Reads