CreditDecoding: Accelerating Parallel Decoding in Diffusion Large Language Models with Trace Credit
📰 ArXiv cs.AI
arXiv:2510.06133v2 Announce Type: replace-cross Abstract: Diffusion large language models (dLLMs) generate text through iterative denoising. In commonly adopted parallel decoding schemes, each step confirms only high-confidence positions while remasking the others. By analyzing dLLM denoising traces, we uncover a key inefficiency: models often predict the correct target token several steps before its confidence becomes high enough to be decoded. This gap between early prediction and late decodin
DeepCamp AI