Learning Additively Compositional Latent Actions for Embodied AI

📰 ArXiv cs.AI

arXiv:2604.03340v1 Announce Type: cross Abstract: Latent action learning infers pseudo-action labels from visual transitions, providing an approach to leverage internet-scale video for embodied AI. However, most methods learn latent actions without structural priors that encode the additive, compositional structure of physical motion. As a result, latents often entangle irrelevant scene details or information about future observations with true state changes and miscalibrate motion magnitude. We

Published 7 Apr 2026

Read full paper → ← Back to News