Structuring The Future: Diffusion LLM Speculative Decoding via Calibrated Draft Graphs

📰 ArXiv cs.AI

arXiv:2509.18085v4 Announce Type: replace-cross Abstract: Diffusion LLMs (dLLMs) have recently emerged as a powerful alternative to autoregressive LLMs (AR-LLMs) with the potential to operate at significantly higher token-generation rates. To unlock this potential, we present Spiffy, a speculative decoding algorithm to accelerate dLLM inference while provably preserving the model's output distribution. This work addresses the unique challenges involved in applying ideas from speculative decoding

Published 12 Jun 2026

Read full paper → ← Back to Reads