ACE-Bench: A Lightweight Benchmark for Evaluating Azure SDK Usage Correctness
📰 ArXiv cs.AI
arXiv:2604.09564v1 Announce Type: cross Abstract: We present ACE-Bench (Azure SDK Coding Evaluation Benchmark), an execution-free benchmark that provides fast, reproducible pass or fail signals for whether large language model (LLM)-based coding agents use Azure SDKs correctly-without provisioning cloud resources or maintaining fragile end-to-end test environments. ACE-Bench turns official Azure SDK documentation examples into self-contained coding tasks and validates solutions with task-specifi
DeepCamp AI