SHE: Stepwise Hybrid Examination Reinforcement Learning Framework for E-commerce Search Relevance

📰 ArXiv cs.AI

arXiv:2510.07972v3 Announce Type: replace Abstract: Query-product relevance prediction is vital for AI-driven e-commerce, yet current LLM-based approaches face a dilemma: SFT and DPO struggle with long-tail generalization due to coarse supervision, while traditional RLVR suffers from sparse feedback that fails to correct intermediate reasoning errors. We propose Stepwise Hybrid Examination (SHE), an RL framework that ensures logical consistency through Stepwise Reward Policy Optimization (SRPO).

Published 14 Apr 2026
Read full paper → ← Back to Reads