Hidden Consensus:Preference-Validity Compression in Human Feedback

📰 ArXiv cs.AI

arXiv:2606.10569v1 Announce Type: cross Abstract: Standard RLHF pipelines often reduce heterogeneous human judgments into a single scalar reward target. We argue that this reduction can mis-measure alignment in structurally plural societies, where disagreement may reflect culturally, historically, linguistically, regionally, or normatively grounded interpretations rather than annotation noise. We call this failure Preference-Validity Compression, the collapse of multiple plural-valid response op

Published 10 Jun 2026
Read full paper → ← Back to Reads