ProEval: Proactive Failure Discovery and Efficient Performance Estimation for Generative AI Evaluation
📰 ArXiv cs.AI
arXiv:2604.23099v1 Announce Type: cross Abstract: Evaluating generative AI models is increasingly resource-intensive due to slow inference, expensive raters, and a rapidly growing landscape of models and benchmarks. We propose ProEval, a proactive evaluation framework that leverages transfer learning to efficiently estimate performance and identify failure cases. ProEval employs pre-trained Gaussian Processes (GPs) as surrogates for the performance score function, mapping model inputs to metrics
DeepCamp AI