Learning Humanoid Navigation from Human Data
📰 ArXiv cs.AI
EgoNav learns humanoid navigation from 5 hours of human walking data using a diffusion model and visual memory
Action Steps
- Collect human walking data to train the diffusion model
- Implement a 360 deg visual memory to fuse color, depth, and semantics
- Utilize video features from a frozen DINOv3 backbone to capture appearance cues
- Test and refine the EgoNav system in various environments
Who Needs to Know This
Robotics engineers and AI researchers on a team can benefit from EgoNav as it enables humanoid robots to navigate diverse environments with minimal training data, while product managers can consider its applications in real-world scenarios
Key Insight
💡 A diffusion model can predict plausible future trajectories for humanoid navigation based on human walking data
Share This
🤖 EgoNav learns humanoid navigation from human data! 💡
Key Takeaways
EgoNav learns humanoid navigation from 5 hours of human walking data using a diffusion model and visual memory
Full Article
Title: Learning Humanoid Navigation from Human Data
Abstract:
arXiv:2604.00416v1 Announce Type: cross Abstract: We present EgoNav, a system that enables a humanoid robot to traverse diverse, unseen environments by learning entirely from 5 hours of human walking data, with no robot data or finetuning. A diffusion model predicts distributions of plausible future trajectories conditioned on past trajectory, a 360 deg visual memory fusing color, depth, and semantics, and video features from a frozen DINOv3 backbone that capture appearance cues invisible to dep
Abstract:
arXiv:2604.00416v1 Announce Type: cross Abstract: We present EgoNav, a system that enables a humanoid robot to traverse diverse, unseen environments by learning entirely from 5 hours of human walking data, with no robot data or finetuning. A diffusion model predicts distributions of plausible future trajectories conditioned on past trajectory, a 360 deg visual memory fusing color, depth, and semantics, and video features from a frozen DINOv3 backbone that capture appearance cues invisible to dep
DeepCamp AI