StepCache: Step-Level Reuse with Lightweight Verification and Selective Patching for LLM Serving

📰 ArXiv cs.AI

arXiv:2603.28795v1 Announce Type: cross Abstract: We address LLM serving workloads where repeated requests share a common solution structure but differ in localized constraints, such as output schema, variable names, or numeric constants. Prior caching approaches typically reuse either full responses (semantic caching) or model-internal KV/prefix states, which are respectively brittle under partial changes or tightly coupled to specific backends. We present StepCache, a backend-agnostic step-lev

Published 1 Apr 2026
Read full paper → ← Back to News