CROP: Conservative Reward for Model-based Offline Policy Optimization
📰 ArXiv cs.AI
arXiv:2310.17245v2 Announce Type: replace-cross Abstract: Offline reinforcement learning (RL) aims to optimize a policy using collected data without online interactions. Model-based approaches are particularly appealing for addressing offline RL challenges because of their capability to mitigate the limitations of data coverage through data generation using models. Nonetheless, a prevalent issue in offline RL is the overestimation caused by distribution shift. This study proposes a novel model-b
DeepCamp AI