UMI-3D: Extending Universal Manipulation Interface from Vision-Limited to 3D Spatial Perception

📰 ArXiv cs.AI

arXiv:2604.14089v1 Announce Type: cross Abstract: We present UMI-3D, a multimodal extension of the Universal Manipulation Interface (UMI) for robust and scalable data collection in embodied manipulation. While UMI enables portable, wrist-mounted data acquisition, its reliance on monocular visual SLAM makes it vulnerable to occlusions, dynamic scenes, and tracking failures, limiting its applicability in real-world environments. UMI-3D addresses these limitations by introducing a lightweight and l

Published 16 Apr 2026
Read full paper → ← Back to Reads