Q-learning with Adjoint Matching

📰 ArXiv cs.AI

arXiv:2601.14234v3 Announce Type: replace-cross Abstract: We propose Q-learning with Adjoint Matching (QAM), a novel TD-based reinforcement learning (RL) algorithm that tackles a long-standing challenge in continuous-action RL: efficient optimization of an expressive diffusion or flow-matching policy with respect to a parameterized Q-function. Effective optimization requires exploiting the first-order information of the critic, but it is challenging to do so for flow or diffusion policies becaus

Published 12 May 2026
Read full paper → ← Back to Reads