Training Transformers in Cosine Coefficient Space

📰 ArXiv cs.AI

Training transformers in cosine coefficient space improves performance by parameterizing weight matrices in the DCT domain

advanced Published 7 Apr 2026

Action Steps

Parameterize weight matrices of a transformer in the 2D discrete cosine transform (DCT) domain
Retain only the lowest-frequency coefficients to reduce dimensionality
Reconstruct the full weight matrix via the inverse DCT at each forward pass
Update the spectral coefficients directly through backpropagation

Who Needs to Know This

ML researchers and engineers working on transformer models can benefit from this approach to improve model performance and efficiency, and it can be applied to various NLP tasks

Key Insight

💡 Parameterizing weight matrices in the DCT domain can improve transformer model performance