Skeleton-to-Image Encoding: Enabling Skeleton Representation Learning via Vision-Pretrained Models

📰 ArXiv cs.AI

arXiv:2603.05963v2 Announce Type: replace-cross Abstract: Recent advances in large-scale pretrained vision models have demonstrated impressive capabilities across a wide range of downstream tasks, including cross-modal and multi-modal scenarios. However, their direct application to 3D human skeleton data remains challenging due to fundamental differences in data format. Moreover, the scarcity of large-scale skeleton datasets and the need to incorporate skeleton data into multi-modal action recog

Published 23 Jun 2026
Read full paper → ← Back to Reads