Learning Additively Compositional Latent Actions for Embodied AI
📰 ArXiv cs.AI
arXiv:2604.03340v1 Announce Type: cross Abstract: Latent action learning infers pseudo-action labels from visual transitions, providing an approach to leverage internet-scale video for embodied AI. However, most methods learn latent actions without structural priors that encode the additive, compositional structure of physical motion. As a result, latents often entangle irrelevant scene details or information about future observations with true state changes and miscalibrate motion magnitude. We
DeepCamp AI