COOPO: Cyclic Offline-Online Policy Optimization Algorithm
📰 ArXiv cs.AI
arXiv:2605.18675v1 Announce Type: cross Abstract: Offline reinforcement learning struggles with distributional shift and constrained performance due to static dataset limitations, while online RL demands prohibitive environment interactions. The recent advent of hybrid offline-to-online methods bridges these domains but suffers from distribution drift during transitions and catastrophic forgetting of offline knowledge. We introduce COOPO (Cyclic Offline-Online Policy Optimization), a generalized
DeepCamp AI