Gradient Iterated Temporal-Difference Learning

📰 ArXiv cs.AI

arXiv:2603.07833v2 Announce Type: replace-cross Abstract: Temporal-difference (TD) learning is highly effective at controlling and evaluating an agent's long-term outcomes. Most approaches in this paradigm implement a semi-gradient update to boost the learning speed, which consists of ignoring the gradient of the bootstrapped estimate. While popular, this type of update is prone to divergence, as Baird's counterexample illustrates. Gradient TD methods were introduced to overcome this issue, but

Published 16 May 2026

Read full paper → ← Back to Reads