The Long Delay to Arithmetic Generalization: When Learned Representations Outrun Behavior

📰 ArXiv cs.AI

arXiv:2604.13082v1 Announce Type: cross Abstract: Grokking in transformers trained on algorithmic tasks is characterized by a long delay between training-set fit and abrupt generalization, but the source of that delay remains poorly understood. In encoder-decoder arithmetic models, we argue that this delay reflects limited access to already learned structure rather than failure to acquire that structure in the first place. We study one-step Collatz prediction and find that the encoder organizes

Published 16 Apr 2026

Read full paper → ← Back to Reads