LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification

📰 ArXiv cs.AI

arXiv:2502.17421v3 Announce Type: replace-cross Abstract: As Large Language Models (LLMs) can now process extremely long contexts, efficient inference over these extended inputs has become increasingly important, especially for emerging applications like LLM agents that highly depend on this capability. Speculative decoding (SD) offers a promising lossless acceleration technique compared to lossy alternatives such as quantization and model cascades. However, most state-of-the-art SD methods are

Published 8 Apr 2026

Read full paper → ← Back to News