TransVLM: A Vision-Language Framework and Benchmark for Detecting Any Shot Transitions

📰 ArXiv cs.AI

arXiv:2604.27975v1 Announce Type: cross Abstract: Traditional Shot Boundary Detection (SBD) inherently struggles with complex transitions by formulating the task around isolated cut points, frequently yielding corrupted video shots. We address this fundamental limitation by formalizing the Shot Transition Detection (STD) task. Rather than searching for ambiguous points, STD explicitly detects the continuous temporal segments of transitions. To tackle this, we propose TransVLM, a Vision-Language

Published 1 May 2026

Read full paper → ← Back to Reads