Scalable and Explainable Learner-Video Interaction Prediction using Multimodal Large Language Models

📰 ArXiv cs.AI

arXiv:2604.04482v1 Announce Type: new Abstract: Learners' use of video controls in educational videos provides implicit signals of cognitive processing and instructional design quality, yet the lack of scalable and explainable predictive models limits instructors' ability to anticipate such behavior before deployment. We propose a scalable, interpretable pipeline for predicting population-level watching, pausing, skipping, and rewinding behavior as proxies for cognitive load from video content a

Published 7 Apr 2026

Read full paper → ← Back to News