BlockBatch: Multi-Scale Consensus Decoding for Efficient Diffusion Language Model Inference

📰 ArXiv cs.AI

arXiv:2605.29233v1 Announce Type: cross Abstract: Diffusion language models (dLLMs) generate text by iteratively denoising multiple token positions in parallel, offering an attractive alternative to strictly autoregressive decoding. In practice, however, block-wise dLLM inference exposes a difficult granularity trade-off: small blocks preserve local conditioning but require many denoising steps, whereas large blocks expose more parallelism but can make premature commitments and accumulate cache

Published 29 May 2026
Read full paper → ← Back to Reads