Self-Bootstrapping Automated Program Repair: Using LLMs to Generate and Evaluate Synthetic Training Data for Bug Repair

📰 ArXiv cs.AI

Using LLMs to generate and evaluate synthetic training data for automated program repair

advanced Published 31 Mar 2026
Action Steps
  1. Generate synthetic training data using LLMs to supplement limited real-world data
  2. Evaluate the generated data to ensure its quality and diversity
  3. Use the synthetic data to train APR models, improving their ability to repair bugs across multiple programming languages
  4. Fine-tune the APR models using the synthetic data to adapt to new bug types and programming languages
Who Needs to Know This

Software engineers and AI researchers on a team can benefit from this approach as it enhances automated program repair capabilities, improving the overall quality and efficiency of the software development process

Key Insight

💡 LLMs can be used to generate high-quality synthetic training data, enhancing the capabilities of automated program repair systems

Share This
💡 LLMs can generate synthetic training data for automated program repair!
Read full paper → ← Back to Reads