LayerCache: Exploiting Layer-wise Velocity Heterogeneity for Efficient Flow Matching Inference
📰 ArXiv cs.AI
arXiv:2604.16492v1 Announce Type: cross Abstract: Flow Matching models achieve state-of-the-art image generation quality but incur substantial inference cost due to iterative denoising through large Transformer networks. We observe that different layer groups within a Transformer exhibit markedly heterogeneous velocity dynamics: shallow layers are highly stable and amenable to aggressive caching, while deep layers undergo large velocity changes that demand full computation. Existing caching meth
DeepCamp AI