Principled Detection of Hallucinations in Large Language Models via Multiple Testing
📰 ArXiv cs.AI
arXiv:2508.18473v3 Announce Type: replace-cross Abstract: While Large Language Models (LLMs) have emerged as powerful foundational models to solve a variety of tasks, they have also been shown to be prone to hallucinations, i.e., generating responses that sound confident but are actually incorrect or even nonsensical. Existing hallucination detectors propose a wide range of empirical scoring rules, but their performance varies across models and datasets, and it is hard to determine which ones to
DeepCamp AI