Position: Coding Benchmarks Are Misaligned with Agentic Software Engineering

📰 ArXiv cs.AI

arXiv:2606.17799v1 Announce Type: cross Abstract: Coding agents have become a major mode of software engineering, but the benchmarks we use to compare them were designed in a pre-agent era: they collapse model, harness, and environment into a single end-to-end score, typically computed against one reference solution, with no component-level signal for iteration. We argue that current coding benchmarks are misaligned with agentic software engineering. A coding agent in practice is not a model: it

Published 17 Jun 2026
Read full paper → ← Back to Reads