Sample-efficient Neuro-symbolic Proximal Policy Optimization

📰 ArXiv cs.AI

Learn how to improve sample efficiency in deep reinforcement learning using neuro-symbolic proximal policy optimization, enabling better performance in sparse-reward domains

advanced Published 29 Apr 2026

Action Steps

Implement Proximal Policy Optimization (PPO) algorithm
Integrate symbolic guidance into PPO using partial logical policy specifications
Transfer learned policies from easier instances to more challenging settings
Evaluate the performance of the neuro-symbolic PPO in sparse-reward domains
Compare the sample efficiency of the proposed method with traditional PPO

Who Needs to Know This

Researchers and engineers working on reinforcement learning and robotics can benefit from this technique to improve the efficiency of their algorithms, especially in complex domains with multiple sub-goals

Key Insight

💡 Neuro-symbolic proximal policy optimization can significantly improve sample efficiency in deep reinforcement learning, especially in sparse-reward domains

Key Takeaways

Learn how to improve sample efficiency in deep reinforcement learning using neuro-symbolic proximal policy optimization, enabling better performance in sparse-reward domains

Full Article

Title: Sample-efficient Neuro-symbolic Proximal Policy Optimization

Abstract:
arXiv:2604.25534v1 Announce Type: new Abstract: Deep Reinforcement Learning (DRL) algorithms often require a large amount of data and struggle in sparse-reward domains with long planning horizons and multiple sub-goals. In this paper, we propose a neuro-symbolic extension of Proximal Policy Optimization (PPO) that transfers partial logical policy specifications learned in easier instances to guide learning in more challenging settings. We introduce two integrations of symbolic guidance: (i) H-PP

Read full paper → ← Back to Reads