Distributionally Robust Token Optimization in RLHF
📰 ArXiv cs.AI
arXiv:2604.08577v1 Announce Type: cross Abstract: Large Language Models (LLMs) tend to respond correctly to prompts that align to the data they were trained and fine-tuned on. Yet, small shifts in wording, format, or language can trigger surprisingly large failures, especially on multi-step reasoning problems. To address this problem, we propose a Distributionally Robust Token Optimization (DRTO) approach, which combines token-level Reinforcement Learning from Human Feedback (RLHF) with Distribu
DeepCamp AI