3D-IDE: 3D Implicit Depth Emergent

📰 ArXiv cs.AI

arXiv:2604.03296v1 Announce Type: cross Abstract: Leveraging 3D information within Multimodal Large Language Models (MLLMs) has recently shown significant advantages for indoor scene understanding. However, existing methods, including those using explicit ground-truth 3D positional encoding and those grafting external 3D foundation models for implicit geometry, struggle with the trade-off in 2D-3D representation fusion, leading to suboptimal deployment. To this end, we propose 3D-Implicit Depth

Published 7 Apr 2026
Read full paper → ← Back to News