Improving Zero-Shot Offline RL via Behavioral Task Sampling
📰 ArXiv cs.AI
arXiv:2604.25496v1 Announce Type: new Abstract: Offline zero-shot reinforcement learning (RL) aims to learn agents that optimize unseen reward functions without additional environment interaction. The standard approach to this problem trains task-conditioned policies by sampling task vectors that define linear reward functions over learned state representations. In most existing algorithms, these task vectors are randomly sampled, implicitly assuming this adequately captures the structure of the
DeepCamp AI