Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation

📰 ArXiv cs.AI

arXiv:2605.21258v1 Announce Type: cross Abstract: Current 3D-aware pretraining methods for embodied perception and manipulation are largely built on differentiable rendering frameworks, producing either fully implicit neural fields or fully explicit geometric primitives. Implicit representations, while expressive, lack explicit structural cues, whereas explicit ones preserve geometry but suffer from resolution limits and weak generalization. To address these limitations, we propose a novel pretr

Published 21 May 2026

Read full paper → ← Back to Reads