LMGenDrive: Bridging Multimodal Understanding and Generative World Modeling for End-to-End Driving

📰 ArXiv cs.AI

arXiv:2604.08719v1 Announce Type: cross Abstract: Recent years have seen remarkable progress in autonomous driving, yet generalization to long-tail and open-world scenarios remains a major bottleneck for large-scale deployment. To address this challenge, some works use LLMs and VLMs for vision-language understanding and reasoning, enabling vehicles to interpret rare and safety-critical situations when generating actions. Others study generative world models to capture the spatio-temporal evoluti

Published 13 Apr 2026

Read full paper → ← Back to Reads