Decomposing the Delta: What Do Models Actually Learn from Preference Pairs?

📰 ArXiv cs.AI

arXiv:2604.08723v1 Announce Type: cross Abstract: Preference optimization methods such as DPO and KTO are widely used for aligning language models, yet little is understood about what properties of preference data drive downstream reasoning gains. We ask: what aspects of a preference pair improve a reasoning model's performance on general reasoning tasks? We investigate two distinct notions of quality delta in preference data: generator-level delta, arising from the differences in capability bet

Published 13 Apr 2026
Read full paper → ← Back to Reads