dLLM-Cache: Accelerating Diffusion Large Language Models with Adaptive Caching

📰 ArXiv cs.AI

arXiv:2506.06295v2 Announce Type: replace-cross Abstract: Autoregressive Models (ARMs) have long dominated the landscape of Large Language Models. Recently, a new paradigm has emerged in the form of diffusion-based Large Language Models (dLLMs), which generate text by iteratively denoising masked segments. This approach has shown significant advantages and potential. However, dLLMs suffer from high inference latency. Traditional ARM acceleration techniques, such as Key-Value caching, are incompa

Published 3 Jun 2026

Read full paper → ← Back to Reads