Alignment Makes Language Models Normative, Not Descriptive
📰 ArXiv cs.AI
arXiv:2603.17218v2 Announce Type: replace-cross Abstract: Post-training alignment optimizes language models to match human preference signals, but this objective is not equivalent to modeling observed human behavior. We compare 120 base-aligned model pairs on more than 10,000 real human decisions in multi-round strategic games - bargaining, persuasion, negotiation, and repeated matrix games. In these settings, base models outperform their aligned counterparts in predicting human choices by nearl
DeepCamp AI