Higher-order Linear Attention

📰 ArXiv cs.AI

arXiv:2510.27258v2 Announce Type: replace-cross Abstract: The quadratic cost of scaled dot-product attention is a central obstacle to scaling autoregressive language models to long contexts. Linear-time attention and State Space Models (SSMs) provide scalable alternatives but are typically restricted to first-order or kernel-based approximations, which can limit expressivity. We introduce Higher-order Linear Attention (HLA), a causal, streaming mechanism that realizes higher interactions via com

Published 14 May 2026
Read full paper → ← Back to Reads