Evaluating Visual Prompts with Eye-Tracking Data for MLLM-Based Human Activity Recognition

📰 ArXiv cs.AI

arXiv:2604.09585v1 Announce Type: cross Abstract: Large Language Models (LLMs) have emerged as foundation models for IoT applications such as human activity recognition (HAR). However, directly applying high-frequency and multi-dimensional sensor data, such as eye-tracking data, leads to information loss and high token costs. To mitigate this, we investigate a visual prompting strategy that transforms sensor signals into data visualization images as an input to multimodal LLMs (MLLMs) using eye-

Published 14 Apr 2026
Read full paper → ← Back to Reads