Consolidation or Adaptation? PRISM: Disentangling SFT and RL Data via Gradient Concentration

📰 ArXiv cs.AI

arXiv:2601.07224v2 Announce Type: replace Abstract: While Hybrid Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has become the standard paradigm for training LLM agents, effective mechanisms for data allocation between these stages remain largely underexplored. Current data arbitration strategies often rely on surface-level heuristics that fail to diagnose intrinsic learning needs. Since SFT targets pattern consolidation through imitation while RL drives structural adaptati

Published 14 Apr 2026
Read full paper → ← Back to Reads