CROP: Conservative Reward for Model-based Offline Policy Optimization

📰 ArXiv cs.AI

arXiv:2310.17245v2 Announce Type: replace-cross Abstract: Offline reinforcement learning (RL) aims to optimize a policy using collected data without online interactions. Model-based approaches are particularly appealing for addressing offline RL challenges because of their capability to mitigate the limitations of data coverage through data generation using models. Nonetheless, a prevalent issue in offline RL is the overestimation caused by distribution shift. This study proposes a novel model-b

Published 14 Apr 2026

Read full paper → ← Back to Reads