SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

📰 ArXiv cs.AI

arXiv:2604.09557v1 Announce Type: cross Abstract: Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic system optimizations, SD performance is inherently data-dependent, meaning that diverse and representative workloads are essential for accurately measuring its effectiveness. Existing benchmarks suffer from limited task diversity, inadequate support for throughput-oriented evaluation, and a reliance on high-lev

Published 14 Apr 2026

Read full paper → ← Back to Reads