When Adaptive Rewards Hurt: Causal Probing and the Switching-Stability Dilemma in LLM-Guided LEO Satellite Scheduling

📰 ArXiv cs.AI

Adaptive rewards in deep reinforcement learning can hurt performance in LEO satellite scheduling due to the switching-stability dilemma

advanced Published 7 Apr 2026

Action Steps

Identify the problem of adaptive rewards in deep reinforcement learning
Understand the switching-stability dilemma and its impact on performance
Evaluate the trade-offs between static and dynamic reward weights
Consider the requirements of quasistationary reward signals for value function estimation

Who Needs to Know This

Machine learning researchers and engineers working on reinforcement learning and satellite scheduling can benefit from understanding the limitations of adaptive rewards, as it can inform their design choices and improve the performance of their models

Key Insight

💡 Near-constant reward weights can outperform dynamic weights due to the need for quasistationary reward signals in PPO