PoM: A Linear-Time Replacement for Attention with the Polynomial Mixer

📰 ArXiv cs.AI

PoM is a linear-time replacement for attention with the Polynomial Mixer, serving as a drop-in replacement for self-attention in transformers

advanced Published 8 Apr 2026

Action Steps

Understand the limitations of self-attention in transformers, particularly its quadratic complexity
Learn about the Polynomial Mixer (PoM) and its linear complexity
Implement PoM as a drop-in replacement for self-attention in existing transformer models
Evaluate the performance of PoM-equipped transformers on benchmark tasks

Who Needs to Know This

ML researchers and engineers working on transformer models can benefit from PoM as it provides a more efficient alternative to self-attention, allowing for faster training and inference times

Key Insight

💡 PoM achieves linear complexity while preserving the contextual mapping property, making it a promising replacement for self-attention