LayerCache: Exploiting Layer-wise Velocity Heterogeneity for Efficient Flow Matching Inference

📰 ArXiv cs.AI

arXiv:2604.16492v1 Announce Type: cross Abstract: Flow Matching models achieve state-of-the-art image generation quality but incur substantial inference cost due to iterative denoising through large Transformer networks. We observe that different layer groups within a Transformer exhibit markedly heterogeneous velocity dynamics: shallow layers are highly stable and amenable to aggressive caching, while deep layers undergo large velocity changes that demand full computation. Existing caching meth

Published 21 Apr 2026

Read full paper → ← Back to Reads