RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time

📰 ArXiv cs.AI

arXiv:2604.11626v1 Announce Type: new Abstract: Most reward models for visual generation reduce rich human judgments to a single unexplained score, discarding the reasoning that underlies preference. We show that teaching reward models to produce explicit, multi-dimensional critiques before scoring transforms them from passive evaluators into active optimization tools, improving generators in two complementary ways: at training time, structured rationales provide interpretable, fine-grained rewa

Published 14 Apr 2026

Read full paper → ← Back to Reads