Mechanistic Interpretability Needs Philosophy

📰 ArXiv cs.AI

arXiv:2506.18852v2 Announce Type: replace-cross Abstract: Mechanistic interpretability (MI) aims to explain how neural networks work by uncovering their underlying mechanisms. As the field grows in influence, it is increasingly important to examine not just models themselves, but the assumptions, concepts and explanatory strategies implicit in MI research. We argue that mechanistic interpretability needs philosophy as an ongoing partner in clarifying its concepts, refining its methods, and navig

Published 20 May 2026

Read full paper → ← Back to Reads