Frictional Q-Learning

📰 ArXiv cs.AI

arXiv:2509.19771v4 Announce Type: replace-cross Abstract: Off-policy reinforcement learning suffers from extrapolation errors when a learned policy selects actions that are weakly supported in the replay buffer. In this study, we address this issue by drawing an analogy to static friction. From this perspective, the replay buffer is represented as a smooth, low-dimensional action manifold, where the support directions correspond to the tangential component, while the normal component captures th

Published 9 May 2026

Read full paper → ← Back to Reads