SIPO: Stabilized and Improved Preference Optimization for Aligning Diffusion Models

📰 ArXiv cs.AI

arXiv:2505.21893v3 Announce Type: replace-cross Abstract: Preference learning has garnered extensive attention as an effective technique for aligning diffusion models with human preferences in visual generation. However, existing alignment approaches such as Diffusion-DPO suffer from two fundamental challenges: training instability caused by high gradient variances at various timesteps and high parameter sensitivities, and off-policy bias arising from the discrepancy between the optimization dat

Published 19 May 2026
Read full paper → ← Back to Reads