Efficient Emotion-Aware Iconic Gesture Prediction for Robot Co-Speech

📰 ArXiv cs.AI

arXiv:2604.11417v1 Announce Type: cross Abstract: Co-speech gestures increase engagement and improve speech understanding. Most data-driven robot systems generate rhythmic beat-like motion, yet few integrate semantic emphasis. To address this, we propose a lightweight transformer that derives iconic gesture placement and intensity from text and emotion alone, requiring no audio input at inference time. The model outperforms GPT-4o in both semantic gesture placement classification and intensity r

Published 14 Apr 2026
Read full paper → ← Back to Reads