COOPO: Cyclic Offline-Online Policy Optimization Algorithm

📰 ArXiv cs.AI

arXiv:2605.18675v1 Announce Type: cross Abstract: Offline reinforcement learning struggles with distributional shift and constrained performance due to static dataset limitations, while online RL demands prohibitive environment interactions. The recent advent of hybrid offline-to-online methods bridges these domains but suffers from distribution drift during transitions and catastrophic forgetting of offline knowledge. We introduce COOPO (Cyclic Offline-Online Policy Optimization), a generalized

Published 19 May 2026
Read full paper → ← Back to Reads