Reward Is Enough: LLMs Are In-Context Reinforcement Learners

📰 ArXiv cs.AI

Large language models can learn through reinforcement learning during inference time, a phenomenon known as in-context reinforcement learning (ICRL)

advanced Published 26 Mar 2026

Action Steps

Introduce a simple multi-round prompting framework, ICRL prompting, to guide LLMs towards self-improvement
Use ICRL prompting to reveal the in-context RL capability of LLMs
Apply ICRL to various tasks to demonstrate its effectiveness
Analyze the results to understand the limitations and potential of ICRL

Who Needs to Know This

AI engineers and ML researchers can benefit from understanding ICRL, as it can improve the performance of LLMs in various tasks, and product managers can leverage this capability to develop more effective language-based products

Key Insight

💡 LLMs can emerge as in-context reinforcement learners during inference time, enabling self-improvement without explicit training