When Adaptive Rewards Hurt: Causal Probing and the Switching-Stability Dilemma in LLM-Guided LEO Satellite Scheduling

📰 ArXiv cs.AI

Adaptive rewards in deep reinforcement learning can hurt performance in LEO satellite scheduling due to the switching-stability dilemma

advanced Published 7 Apr 2026
Action Steps
  1. Identify the problem of adaptive rewards in deep reinforcement learning
  2. Understand the switching-stability dilemma and its impact on performance
  3. Evaluate the trade-offs between static and dynamic reward weights
  4. Consider the requirements of quasistationary reward signals for value function estimation
Who Needs to Know This

Machine learning researchers and engineers working on reinforcement learning and satellite scheduling can benefit from understanding the limitations of adaptive rewards, as it can inform their design choices and improve the performance of their models

Key Insight

💡 Near-constant reward weights can outperform dynamic weights due to the need for quasistationary reward signals in PPO

Share This
💡 Adaptive rewards can hurt performance in LEO satellite scheduling due to switching-stability dilemma
Read full paper → ← Back to Reads