Self-Bootstrapping Automated Program Repair: Using LLMs to Generate and Evaluate Synthetic Training Data for Bug Repair

📰 ArXiv cs.AI

Using LLMs to generate and evaluate synthetic training data for automated program repair

advanced Published 31 Mar 2026

Action Steps

Generate synthetic training data using LLMs to supplement limited real-world data
Evaluate the generated data to ensure its quality and diversity
Use the synthetic data to train APR models, improving their ability to repair bugs across multiple programming languages
Fine-tune the APR models using the synthetic data to adapt to new bug types and programming languages

Who Needs to Know This

Software engineers and AI researchers on a team can benefit from this approach as it enhances automated program repair capabilities, improving the overall quality and efficiency of the software development process

Key Insight

💡 LLMs can be used to generate high-quality synthetic training data, enhancing the capabilities of automated program repair systems