MapDream: Task-Driven Map Learning for Vision-Language Navigation

📰 ArXiv cs.AI

arXiv:2602.00222v3 Announce Type: replace-cross Abstract: Vision-Language Navigation (VLN) requires agents to follow natural language instructions in partially observed 3D environments, motivating map representations that aggregate spatial context beyond local perception. However, most existing approaches rely on hand-crafted maps constructed independently of the navigation policy. We argue that maps should instead be learned representations shaped directly by navigation objectives rather than e

Published 16 Jun 2026
Read full paper → ← Back to Reads