StructRL: Recovering Dynamic Programming Structure from Learning Dynamics in Distributional Reinforcement Learning

📰 ArXiv cs.AI

arXiv:2604.08620v1 Announce Type: cross Abstract: Reinforcement learning is typically treated as a uniform, data-driven optimization process, where updates are guided by rewards and temporal-difference errors without explicitly exploiting global structure. In contrast, dynamic programming methods rely on structured information propagation, enabling efficient and stable learning. In this paper, we provide evidence that such structure can be recovered from the learning dynamics of distributional r

Published 13 Apr 2026

Read full paper → ← Back to Reads