GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding

📰 ArXiv cs.AI

GazeQwen is a lightweight approach to incorporate eye-gaze information into large language models for improved video understanding

advanced Published 30 Mar 2026

Action Steps

Utilize gaze cues via visual overlays or text descriptions
Implement a compact gaze resampler to encode video features
Modulate the hidden state of an open-source MLLM with gaze information
Evaluate the performance of GazeQwen on video understanding tasks

Who Needs to Know This

ML researchers and AI engineers can benefit from GazeQwen as it provides a parameter-efficient way to equip MLLMs with gaze awareness, enhancing video understanding capabilities

Key Insight

💡 Incorporating eye-gaze information into large language models can improve video understanding capabilities