MeSH: Memory-as-State-Highways for Recursive Transformers
📰 ArXiv cs.AI
arXiv:2510.07739v2 Announce Type: replace-cross Abstract: Recursive transformers reuse parameters and iterate over hidden states multiple times, decoupling compute depth from parameter depth. However, under matched compute, recursive models with fewer parameters often lag behind non-recursive counterparts. By probing hidden states, we trace this performance gap to two primary bottlenecks: undifferentiated computation, where the core is forced to adopt a similar computational pattern at every ite
DeepCamp AI