SparseBalance: Load-Balanced Long Context Training with Dynamic Sparse Attention

📰 ArXiv cs.AI

arXiv:2604.13847v1 Announce Type: cross Abstract: While sparse attention mitigates the computational bottleneck of long-context LLM training, its distributed training process exhibits extreme heterogeneity in both \textit{1)} sequence length and \textit{2)} sparsity sensitivity, leading to a severe imbalance problem and sub-optimal model accuracy. Existing algorithms and training frameworks typically focus on single issue, failing to systematically co-optimize these two problems. Therefore, we p

Published 16 Apr 2026

Read full paper → ← Back to Reads