Alignment Makes Language Models Normative, Not Descriptive

📰 ArXiv cs.AI

arXiv:2603.17218v2 Announce Type: replace-cross Abstract: Post-training alignment optimizes language models to match human preference signals, but this objective is not equivalent to modeling observed human behavior. We compare 120 base-aligned model pairs on more than 10,000 real human decisions in multi-round strategic games - bargaining, persuasion, negotiation, and repeated matrix games. In these settings, base models outperform their aligned counterparts in predicting human choices by nearl

Published 27 May 2026

Read full paper → ← Back to Reads