Aerial Vision-Language Navigation with a Unified Framework for Spatial, Temporal and Embodied Reasoning

📰 ArXiv cs.AI

arXiv:2512.08639v3 Announce Type: replace-cross Abstract: Aerial Vision-and-Language Navigation (VLN) aims to enable unmanned aerial vehicles (UAVs) to interpret natural language instructions and navigate complex urban environments using onboard visual observation. This task holds promise for real-world applications such as low-altitude inspection, search-and-rescue, and autonomous aerial delivery. Existing methods often rely on panoramic images, depth inputs, or odometry to support spatial reas

Published 16 Apr 2026
Read full paper → ← Back to Reads