Evaluating Visual Prompts with Eye-Tracking Data for MLLM-Based Human Activity Recognition
📰 ArXiv cs.AI
arXiv:2604.09585v1 Announce Type: cross Abstract: Large Language Models (LLMs) have emerged as foundation models for IoT applications such as human activity recognition (HAR). However, directly applying high-frequency and multi-dimensional sensor data, such as eye-tracking data, leads to information loss and high token costs. To mitigate this, we investigate a visual prompting strategy that transforms sensor signals into data visualization images as an input to multimodal LLMs (MLLMs) using eye-
DeepCamp AI