You Only Judge Once: Multi-response Reward Modeling in a Single Forward Pass

📰 ArXiv cs.AI

arXiv:2604.10966v1 Announce Type: cross Abstract: We present a discriminative multimodal reward model that scores all candidate responses in a single forward pass. Conventional discriminative reward models evaluate each response independently, requiring multiple forward passes, one for each potential response. Our approach concatenates multiple responses with separator tokens and applies cross-entropy over their scalar scores, enabling direct comparative reasoning and efficient $N$-way preferenc

Published 14 Apr 2026
Read full paper → ← Back to Reads