Diff-Instruct with Diffused Reward: Towards Principled One-step Generator RL

📰 ArXiv cs.AI

arXiv:2605.24001v1 Announce Type: cross Abstract: Recent advances in one-step text-to-image generation have enabled real-time synthesis with remarkable efficiency and quality. Previous reinforcement learning methods for one-step generators combine image-space reward optimization with diffusion noisy-space distribution matching. This paradigm brings challenges due to a mismatch between terminal reward optimization and the underlying generative dynamics. As a result, optimization tends to exploit

Published 26 May 2026

Read full paper → ← Back to Reads