Do Sparse Autoencoders Capture Concept Manifolds?

📰 ArXiv cs.AI

arXiv:2604.28119v1 Announce Type: cross Abstract: Sparse autoencoders (SAEs) are widely used to extract interpretable features from neural network representations, often under the implicit assumption that concepts correspond to independent linear directions. However, a growing body of evidence suggests that many concepts are instead organized along low-dimensional manifolds encoding continuous geometric relationships. This raises three basic questions: what does it mean for an SAE to capture a m

Published 1 May 2026

Read full paper → ← Back to Reads