GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding
📰 ArXiv cs.AI
GazeQwen is a lightweight approach to incorporate eye-gaze information into large language models for improved video understanding
Action Steps
- Utilize gaze cues via visual overlays or text descriptions
- Implement a compact gaze resampler to encode video features
- Modulate the hidden state of an open-source MLLM with gaze information
- Evaluate the performance of GazeQwen on video understanding tasks
Who Needs to Know This
ML researchers and AI engineers can benefit from GazeQwen as it provides a parameter-efficient way to equip MLLMs with gaze awareness, enhancing video understanding capabilities
Key Insight
💡 Incorporating eye-gaze information into large language models can improve video understanding capabilities
Share This
💡 GazeQwen: a lightweight approach to gaze-conditioned LLM modulation for streaming video understanding
DeepCamp AI