Dynamic Linear Attention
📰 ArXiv cs.AI
arXiv:2606.10650v1 Announce Type: cross Abstract: The scalability of Large Language Models (LLMs) to long contexts is fundamentally constrained by the quadratic complexity of standard attention, motivating the adoption of linear attention mechanisms with sub-quadratic cost. To improve representation capacity under long contexts, recent approaches organize memory in a multi-state manner. However, existing multi-state linear attention methods rely on fixed state merging policies that cannot adapt
DeepCamp AI