On the Structural Non-Preservation of Epistemic Behaviour under Policy Transformation

📰 ArXiv cs.AI

Research on reinforcement learning agents under partial observability reveals limitations in preserving epistemic behavior under policy transformation

advanced Published 23 Mar 2026

Action Steps

Define behavioural dependency as variation in action selection with respect to internal information under fixed observations
Formalize probe-relative notion of ε-behavioural equivalence
Analyze within-policy transformation to identify limitations in preserving epistemic behavior
Apply findings to improve RL agent design and performance under partial observability

Who Needs to Know This

ML researchers and AI engineers working on reinforcement learning and partial observability benefit from understanding the structural non-preservation of epistemic behavior, as it impacts the design of more effective RL agents

Key Insight

💡 Epistemic behavior in RL agents is not structurally preserved under policy transformation, limiting their effectiveness in complex environments