Query-efficient model evaluation using cached responses

📰 ArXiv cs.AI

arXiv:2605.07096v1 Announce Type: cross Abstract: Evaluating a new model on an existing benchmark is often necessary to understand its behavior before deployment. For modern evaluation frameworks, generating and evaluating a response for all queries can be prohibitively expensive. In practice, responses from previously-evaluated models are often cached -- creating a potential opportunity to use this additional information to decrease the number of queries required to accurately evaluate a new mo

Published 11 May 2026

Read full paper → ← Back to Reads