GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding

📰 ArXiv cs.AI

GazeQwen is a lightweight approach to incorporate eye-gaze information into large language models for improved video understanding

advanced Published 30 Mar 2026
Action Steps
  1. Utilize gaze cues via visual overlays or text descriptions
  2. Implement a compact gaze resampler to encode video features
  3. Modulate the hidden state of an open-source MLLM with gaze information
  4. Evaluate the performance of GazeQwen on video understanding tasks
Who Needs to Know This

ML researchers and AI engineers can benefit from GazeQwen as it provides a parameter-efficient way to equip MLLMs with gaze awareness, enhancing video understanding capabilities

Key Insight

💡 Incorporating eye-gaze information into large language models can improve video understanding capabilities

Share This
💡 GazeQwen: a lightweight approach to gaze-conditioned LLM modulation for streaming video understanding
Read full paper → ← Back to News