Sample-efficient Neuro-symbolic Proximal Policy Optimization
📰 ArXiv cs.AI
arXiv:2604.25534v1 Announce Type: new Abstract: Deep Reinforcement Learning (DRL) algorithms often require a large amount of data and struggle in sparse-reward domains with long planning horizons and multiple sub-goals. In this paper, we propose a neuro-symbolic extension of Proximal Policy Optimization (PPO) that transfers partial logical policy specifications learned in easier instances to guide learning in more challenging settings. We introduce two integrations of symbolic guidance: (i) H-PP
DeepCamp AI