Interpretable Coreference Resolution Evaluation Using Explicit Semantics

📰 ArXiv cs.AI

arXiv:2605.10627v1 Announce Type: cross Abstract: Coreference resolution is typically evaluated using aggregate statistical metrics such as CoNLL-F1, which measure structural overlap between predicted and gold clusters. While widely used, these metrics offer limited diagnostic insights, penalizing errors without revealing whether a system struggles with specific semantic categories, such as people, locations, or events, and making it difficult to interpret model capabilities or derive actionable

Published 12 May 2026

Read full paper → ← Back to Reads