Hidden Consensus:Preference-Validity Compression in Human Feedback
📰 ArXiv cs.AI
arXiv:2606.10569v1 Announce Type: cross Abstract: Standard RLHF pipelines often reduce heterogeneous human judgments into a single scalar reward target. We argue that this reduction can mis-measure alignment in structurally plural societies, where disagreement may reflect culturally, historically, linguistically, regionally, or normatively grounded interpretations rather than annotation noise. We call this failure Preference-Validity Compression, the collapse of multiple plural-valid response op
DeepCamp AI