SIPO: Stabilized and Improved Preference Optimization for Aligning Diffusion Models
📰 ArXiv cs.AI
arXiv:2505.21893v3 Announce Type: replace-cross Abstract: Preference learning has garnered extensive attention as an effective technique for aligning diffusion models with human preferences in visual generation. However, existing alignment approaches such as Diffusion-DPO suffer from two fundamental challenges: training instability caused by high gradient variances at various timesteps and high parameter sensitivities, and off-policy bias arising from the discrepancy between the optimization dat
DeepCamp AI