Layerwise Dynamics for In-Context Classification in Transformers

📰 ArXiv cs.AI

arXiv:2604.11613v1 Announce Type: cross Abstract: Transformers can perform in-context classification from a few labeled examples, yet the inference-time algorithm remains opaque. We study multi-class linear classification in the hard no-margin regime and make the computation identifiable by enforcing feature- and label-permutation equivariance at every layer. This enables interpretability while maintaining functional equivalence and yields highly structured weights. From these models we extract

Published 14 Apr 2026

Read full paper → ← Back to Reads