On the Trainability of Masked Diffusion Language Models via Blockwise Locality

📰 ArXiv cs.AI

arXiv:2604.24832v1 Announce Type: cross Abstract: Masked diffusion language models (MDMs) have recently emerged as a promising alternative to standard autoregressive large language models (AR-LLMs), yet their optimization can be substantially less stable. We study blockwise MDMs and compare them with AR-LLMs on three controlled tasks that stress different aspects of structured generation: in-context linear regression, graph path-finding, and Sudoku solving. We find that standard random-masking M

Published 29 Apr 2026
Read full paper → ← Back to Reads