Compute Aligned Training: Optimizing for Test Time Inference

📰 ArXiv cs.AI

arXiv:2604.24957v1 Announce Type: cross Abstract: Scaling test-time compute has emerged as a powerful mechanism for enhancing Large Language Model (LLM) performance. However, standard post-training paradigms, Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), optimize the likelihood of individual samples under a base policy, creating a misalignment with test time procedures that rely on aggregated or filtered outputs. In this work, we propose Compute Aligned Training, which aligns tra

Published 29 Apr 2026
Read full paper → ← Back to Reads