Hacking Generative Perplexity: Why Unconditional Text Evaluation Needs Distributional Metrics
📰 ArXiv cs.AI
arXiv:2606.08417v1 Announce Type: cross Abstract: Diffusion and continuous flow-based language models have emerged as the leading non-autoregressive alternatives to language modeling. Progress in both paradigms is overwhelmingly tracked by generative perplexity (gen-PPL): the per-token negative log-likelihood of samples under a frozen autoregressive (AR) scorer such as gpt2-large, typically paired with an empirical-entropy guardrail to rule out low-entropy collapse. We argue that this metric is
DeepCamp AI