Towards Self-Evolving Benchmarks: Synthesizing Agent Trajectories via Test-Time Exploration under Validate-by-Reproduce Paradigm

📰 ArXiv cs.AI

TRACE framework synthesizes agent trajectories via test-time exploration for self-evolving benchmarks

advanced Published 25 Mar 2026

Action Steps

Identify the limitations of existing agent benchmarks
Propose the TRACE framework for synthesizing agent trajectories
Implement test-time exploration under the Validate-by-Reproduce paradigm
Evaluate the effectiveness of the TRACE framework in creating self-evolving benchmarks

Who Needs to Know This

AI researchers and engineers benefit from this framework as it enables the creation of more challenging and dynamic benchmarks for evaluating agent abilities, allowing them to better assess and improve their models

Key Insight

💡 The TRACE framework enables the creation of more challenging and dynamic benchmarks for evaluating agent abilities