Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning
📰 ArXiv cs.AI
arXiv:2512.08639v3 Announce Type: replace-cross Abstract: Aerial Vision-and-Language Navigation (VLN) aims to enable unmanned aerial vehicles (UAVs) to interpret natural language instructions and navigate complex urban environments using onboard visual observation. This task holds promise for real-world applications such as low-altitude inspection, search-and-rescue, and autonomous aerial delivery. Existing methods often rely on panoramic images, depth inputs, or odometry to support spatial reas
DeepCamp AI