AdaptiveLoad: Towards Efficient Video Diffusion Transformer Training
📰 ArXiv cs.AI
arXiv:2605.17923v1 Announce Type: cross Abstract: In video generation models, particularly world models, training large-scale video diffusion Transformers (such as DiT and MMDiT) poses significant computational challenges due to the extreme variance in sequence lengths within mixed-mode datasets. Existing bucket-based data loading strategies typically rely on "equal token length" constraints. This approach fails to account for the quadratic complexity of self-attention mechanisms, leading to sev
DeepCamp AI