Why we no longer evaluate SWE-bench Verified

📰 OpenAI News

OpenAI no longer evaluates SWE-bench Verified due to contamination and flawed tests

intermediate Published 23 Feb 2026
Action Steps
  1. Recognize the limitations of SWE-bench Verified
  2. Understand the issues of flawed tests and training leakage
  3. Consider using SWE-bench Pro as an alternative
  4. Evaluate the impact on coding progress evaluation
Who Needs to Know This

Software engineers and developers on a team benefit from understanding the limitations of SWE-bench Verified and the recommendation to use SWE-bench Pro instead, as it impacts their coding progress evaluation

Key Insight

💡 SWE-bench Verified is no longer a reliable measure of coding progress due to contamination and flawed tests

Share This
🚫 OpenAI drops SWE-bench Verified due to contamination & flawed tests! 💻
Read full article → ← Back to News