MeSH: Memory-as-State-Highways for Recursive Transformers

📰 ArXiv cs.AI

arXiv:2510.07739v2 Announce Type: replace-cross Abstract: Recursive transformers reuse parameters and iterate over hidden states multiple times, decoupling compute depth from parameter depth. However, under matched compute, recursive models with fewer parameters often lag behind non-recursive counterparts. By probing hidden states, we trace this performance gap to two primary bottlenecks: undifferentiated computation, where the core is forced to adopt a similar computational pattern at every ite

Published 21 Apr 2026
Read full paper → ← Back to Reads