Native Hybrid Attention for Efficient Sequence Modeling
📰 ArXiv cs.AI
arXiv:2510.07019v3 Announce Type: replace-cross Abstract: Transformers excel at sequence modeling but face quadratic complexity, while linear attention offers improved efficiency but often compromises recall accuracy over long contexts. In this work, we introduce Native Hybrid Attention (NHA), a novel hybrid architecture of linear and full attention that integrates both intra & inter-layer hybridization into a unified layer design. NHA maintains long-term context in key-value slots updated by a
DeepCamp AI