FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation

📰 ArXiv cs.AI

FastCache accelerates Diffusion Transformer inference through learnable linear approximation and caching

advanced Published 30 Mar 2026

Action Steps

Identify redundancy in internal representations of Diffusion Transformers
Implement hidden-state-level caching and compression using learnable linear approximation
Optimize model inference by exploiting spatial-aware token representations
Evaluate and refine FastCache framework for improved acceleration

Who Needs to Know This

AI engineers and researchers working on generative models can benefit from FastCache to improve inference efficiency, while machine learning engineers can apply this technique to optimize model performance

Key Insight

💡 Learnable linear approximation can effectively reduce computational intensity of Diffusion Transformers