Reinforcement Learning for Multi-Turn Software Engineering Agents

PaperVideos · Advanced ·🧠 Large Language Models ·7mo ago
This research explores training large language models (LLMs) as software engineering (SWE) agents using reinforcement learning (RL), moving beyond single-turn problems to complex, multi-turn interactions. The authors introduce a modified Decoupled Advantage Policy Optimization (DAPO) algorithm to enhance an agent's ability to solve real-world SWE tasks. Their approach, which includes a two-phase training pipeline (rejection fine-tuning followed by multi-turn RL), significantly improves the agent's success rate on benchmarks like SWE-bench Verified. The study highlights the challenges of long-h…
Watch on YouTube ↗ (saves to browser)
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)