OSCBench: Benchmarking Object State Change in Text-to-Video Generation

📰 ArXiv cs.AI

arXiv:2603.11698v2 Announce Type: replace-cross Abstract: Text-to-video (T2V) generation models have made rapid progress in producing visually high-quality and temporally coherent videos. However, existing benchmarks primarily focus on perceptual quality, text-video alignment, or physical plausibility, leaving a critical aspect of action understanding largely unexplored: object state change (OSC) explicitly specified in the text prompt. OSC refers to the transformation of an object's state induc

Published 20 Apr 2026
Read full paper → ← Back to Reads