Self-Guided Plan Extraction for Instruction-Following Tasks with Goal-Conditional Reinforcement Learning

📰 ArXiv cs.AI

arXiv:2604.20601v1 Announce Type: new Abstract: We introduce SuperIgor, a framework for instruction-following tasks. Unlike prior methods that rely on predefined subtasks, SuperIgor enables a language model to generate and refine high-level plans through a self-learning mechanism, reducing the need for manual dataset annotation. Our approach involves iterative co-training: an RL agent is trained to follow the generated plans, while the language model adapts and modifies these plans based on RL f

Published 23 Apr 2026
Read full paper → ← Back to Reads