ROSE: An Intent-Centered Evaluation Metric for NL2SQL

📰 ArXiv cs.AI

arXiv:2604.12988v1 Announce Type: cross Abstract: Execution Accuracy (EX), the widely used metric for evaluating the effectiveness of Natural Language to SQL (NL2SQL) solutions, is becoming increasingly unreliable. It is sensitive to syntactic variation, ignores that questions may admit multiple interpretations, and is easily misled by erroneous ground-truth SQL. To address this, we introduce ROSE, an intent-centered metric that focuses on whether the predicted SQL answers the question, rather t

Published 15 Apr 2026

Read full paper → ← Back to Reads