OSCBench: Benchmarking Object State Change in Text-to-Video Generation
📰 ArXiv cs.AI
arXiv:2603.11698v2 Announce Type: replace-cross Abstract: Text-to-video (T2V) generation models have made rapid progress in producing visually high-quality and temporally coherent videos. However, existing benchmarks primarily focus on perceptual quality, text-video alignment, or physical plausibility, leaving a critical aspect of action understanding largely unexplored: object state change (OSC) explicitly specified in the text prompt. OSC refers to the transformation of an object's state induc
DeepCamp AI