Graph Memory Transformer (GMT)

📰 ArXiv cs.AI

arXiv:2604.23862v1 Announce Type: cross Abstract: We investigate whether the Feed-Forward Network (FFN) sublayer in a decoder-only transformer can be replaced by an explicit learned memory graph while preserving the surrounding autoregressive architecture. The proposed Graph Memory Transformer (GMT) keeps causal self-attention intact, but replaces the usual per-token FFN transformation with a memory cell that routes token representations over a learned bank of centroids connected by a learned di

Published 28 Apr 2026

Read full paper → ← Back to Reads