Justified or Just Convincing? Error Verifiability as a Dimension of LLM Quality
📰 ArXiv cs.AI
arXiv:2604.04418v1 Announce Type: cross Abstract: As LLMs are deployed in high-stakes settings, users must judge the correctness of individual responses, often relying on model-generated justifications such as reasoning chains or explanations. Yet, no standard measure exists for whether these justifications help users distinguish correct answers from incorrect ones. We formalize this idea as error verifiability and propose $v_{\text{bal}}$, a balanced metric that measures whether justifications
DeepCamp AI