FOCA: Future-Oriented Conditioning for Data-Efficient Vision-Language-Action Adaptation

📰 ArXiv cs.AI

arXiv:2606.20867v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models enable general-purpose robotic control via large-scale multimodal pretraining, yet their effectiveness under few-shot imitation learning remains limited. We conduct a systematic stress test of state-of-the-art VLA models and show that performance degrades sharply as demonstrations are reduced, revealing a key weakness of existing adaptation strategies. To address this, we introduce FOCA, a future-oriented condi

Published 23 Jun 2026

Read full paper → ← Back to Reads