Prototype Transformer: Towards Language Model Architectures Interpretable by Design

📰 ArXiv cs.AI

arXiv:2602.11852v2 Announce Type: replace Abstract: While state-of-the-art language models (LMs) surpass most humans in certain domains, their reasoning remains largely opaque, reducing trust and increasing the risk of deception and hallucination. We introduce the Prototype Transformer (ProtoT), an autoregressive LM architecture that replaces the quadratic-cost self-attention module of the Transformer with a linear-cost module based on prototypes, which are learned parameter vectors. In ProtoT,

Published 2 Jun 2026

Read full paper → ← Back to Reads