PRAISE: Prefix-Based Rollout Reuse in Agentic Search Training

📰 ArXiv cs.AI

arXiv:2604.03675v1 Announce Type: new Abstract: In agentic search, large language models (LLMs) are trained to perform multi-turn retrieval and reasoning for complex tasks such as multi-hop question answering (QA). However, current search-based Reinforcement Learning (RL) methods suffer from two core limitations: expensive long-horizon rollouts are under-utilized during training, and supervision is typically available only at the final answer, resulting in severe reward sparsity. We present Pref

Published 7 Apr 2026
Read full paper → ← Back to News