EvoTest: Evolutionary Test-Time Learning for Self-Improving Agentic Systems
📰 ArXiv cs.AI
arXiv:2510.13220v2 Announce Type: replace Abstract: A fundamental limitation of current AI agents is their inability to learn complex skills on the fly at test time, often behaving like "clever but clueless interns" in novel environments. This severely limits their practical utility. To systematically measure and drive progress on this challenge, we first introduce the Jericho Test-Time Learning (J-TTL) benchmark. J-TTL is a new evaluation setup where an agent must play the same game for several
DeepCamp AI