Learning from Contrasts: Synthesizing Reasoning Paths from Diverse Search Trajectories
📰 ArXiv cs.AI
arXiv:2604.11365v1 Announce Type: new Abstract: Monte Carlo Tree Search (MCTS) has been widely used for automated reasoning data exploration, but current supervision extraction methods remain inefficient. Standard approaches retain only the single highest-reward trajectory, discarding the comparative signals present in the many explored paths. Here we introduce \textbf{Contrastive Reasoning Path Synthesis (CRPS)}, a framework that transforms supervision extraction from a filtering process into a
DeepCamp AI