Compute Aligned Training: Optimizing for Test Time Inference
📰 ArXiv cs.AI
arXiv:2604.24957v1 Announce Type: cross Abstract: Scaling test-time compute has emerged as a powerful mechanism for enhancing Large Language Model (LLM) performance. However, standard post-training paradigms, Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL), optimize the likelihood of individual samples under a base policy, creating a misalignment with test time procedures that rely on aggregated or filtered outputs. In this work, we propose Compute Aligned Training, which aligns tra
DeepCamp AI