Improving Sparse Autoencoder with Dynamic Attention

📰 ArXiv cs.AI

arXiv:2604.14925v1 Announce Type: cross Abstract: Recently, sparse autoencoders (SAEs) have emerged as a promising technique for interpreting activations in foundation models by disentangling features into a sparse set of concepts. However, identifying the optimal level of sparsity for each neuron remains challenging in practice: excessive sparsity can lead to poor reconstruction, whereas insufficient sparsity may harm interpretability. While existing activation functions such as ReLU and TopK p

Published 17 Apr 2026

Read full paper → ← Back to Reads