Rethinking Video Human-Object Interaction: Set Prediction over Time for Unified Detection and Anticipation
📰 ArXiv cs.AI
arXiv:2604.10397v1 Announce Type: cross Abstract: Video-based human-object interaction (HOI) understanding requires both detecting ongoing interactions and anticipating their future evolution. However, existing methods usually treat anticipation as a downstream forecasting task built on externally constructed human-object pairs, limiting joint reasoning between detection and prediction. In addition, sparse keyframe annotations in current benchmarks can temporally misalign nominal future labels f
DeepCamp AI