Do Hallucination Neurons Generalize? Evidence from Cross-Domain Transfer in LLMs
📰 ArXiv cs.AI
arXiv:2604.19765v1 Announce Type: cross Abstract: Recent work identifies a sparse set of "hallucination neurons" (H-neurons), less than 0.1% of feed-forward network neurons, that reliably predict when large language models will hallucinate. These neurons are identified on general-knowledge question answering and shown to generalize to new evaluation instances. We ask a natural follow-up question: do H-neurons generalize across knowledge domains? Using a systematic cross-domain transfer protocol
DeepCamp AI