Understanding Asynchronous Inference Methods for Vision-Language-Action Models

📰 ArXiv cs.AI

arXiv:2605.08168v1 Announce Type: cross Abstract: Vision-Language-Action (VLA) models offer a promising path to generalist robot control, but their inference latency causes observation staleness when generated actions are executed asynchronously. Several methods have been proposed concurrently to mitigate this problem: inference-time inpainting (IT-RTC), training-time delay simulation (TT-RTC), future-state-aware conditioning (VLASH), and lightweight residual correction (A2C2). Each takes a fund

Published 12 May 2026
Read full paper → ← Back to Reads