Training Transformers in Cosine Coefficient Space
📰 ArXiv cs.AI
Training transformers in cosine coefficient space improves performance by parameterizing weight matrices in the DCT domain
Action Steps
- Parameterize weight matrices of a transformer in the 2D discrete cosine transform (DCT) domain
- Retain only the lowest-frequency coefficients to reduce dimensionality
- Reconstruct the full weight matrix via the inverse DCT at each forward pass
- Update the spectral coefficients directly through backpropagation
Who Needs to Know This
ML researchers and engineers working on transformer models can benefit from this approach to improve model performance and efficiency, and it can be applied to various NLP tasks
Key Insight
💡 Parameterizing weight matrices in the DCT domain can improve transformer model performance
Share This
💡 Train transformers in cosine coefficient space for improved performance
DeepCamp AI