RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
📰 ArXiv cs.AI
arXiv:2604.11626v1 Announce Type: new Abstract: Most reward models for visual generation reduce rich human judgments to a single unexplained score, discarding the reasoning that underlies preference. We show that teaching reward models to produce explicit, multi-dimensional critiques before scoring transforms them from passive evaluators into active optimization tools, improving generators in two complementary ways: at training time, structured rationales provide interpretable, fine-grained rewa
DeepCamp AI