AutoBaxBuilder: Bootstrapping Code Security Benchmarking

📰 ArXiv cs.AI

arXiv:2512.21132v2 Announce Type: replace-cross Abstract: As large language models (LLMs) see wide adoption in software engineering, the reliable assessment of the correctness and security of LLM-generated code is crucial. Notably, prior work showed that LLMs are prone to generating code with security vulnerabilities, highlighting that security is often overlooked. These insights were enabled by specialized benchmarks crafted by security experts through significant manual effort. However, benchm

Published 23 May 2026

Read full paper → ← Back to Reads