DiffScore: Text Evaluation Beyond Autoregressive Likelihood
📰 ArXiv cs.AI
arXiv:2605.11601v1 Announce Type: cross Abstract: Autoregressive language models are widely used for text evaluation, however, their left-to-right factorization introduces positional bias, i.e., early tokens are scored with only leftward context, conflating architectural asymmetry with true text quality. We propose masked reconstruction as an alternative paradigm, where every token is scored using full bidirectional context. We introduce DiffScore, an evaluation framework built on Masked Large D
DeepCamp AI