When Adaptive Rewards Hurt: Causal Probing and the Switching-Stability Dilemma in LLM-Guided LEO Satellite Scheduling
📰 ArXiv cs.AI
Adaptive rewards in deep reinforcement learning can hurt performance in LEO satellite scheduling due to the switching-stability dilemma
Action Steps
- Identify the problem of adaptive rewards in deep reinforcement learning
- Understand the switching-stability dilemma and its impact on performance
- Evaluate the trade-offs between static and dynamic reward weights
- Consider the requirements of quasistationary reward signals for value function estimation
Who Needs to Know This
Machine learning researchers and engineers working on reinforcement learning and satellite scheduling can benefit from understanding the limitations of adaptive rewards, as it can inform their design choices and improve the performance of their models
Key Insight
💡 Near-constant reward weights can outperform dynamic weights due to the need for quasistationary reward signals in PPO
Share This
💡 Adaptive rewards can hurt performance in LEO satellite scheduling due to switching-stability dilemma
DeepCamp AI