LVSum: A Benchmark for Timestamp-Aware Long Video Summarization

📰 ArXiv cs.AI

arXiv:2604.10024v1 Announce Type: cross Abstract: Long video summarization presents significant challenges for current multimodal large language models (MLLMs), particularly in maintaining temporal fidelity over extended durations and producing summaries that are both semantically and temporally grounded. In this work, we present LVSum, a human-annotated benchmark designed specifically for evaluating long video summarization with fine-grained temporal alignment. LVSum comprises diverse long-form

Published 14 Apr 2026

Read full paper → ← Back to Reads