SWE-IF: Aligning Code Evaluation with Human Preference

📰 ArXiv cs.AI

arXiv:2510.07315v2 Announce Type: replace-cross Abstract: Large Language Models (LLMs) have catalyzed vibe coding, where users leverage LLMs to generate and iteratively refine code through natural language interactions until it passes their vibe check. Vibe check reflects human preference and goes beyond functionality: the solution should feel right, read cleanly, preserve intent, and remain correct. However, current code evaluation remains anchored to pass@k and captures only functional correct

Published 8 Jun 2026
Read full paper → ← Back to Reads