Willful Disobedience: Automatically Detecting Failures in Agentic Traces

📰 ArXiv cs.AI

AgentPex automatically detects failures in agentic traces, improving validation of AI agents in software systems

advanced Published 26 Mar 2026

Action Steps

Collect agentic traces from AI agents executing multi-step workflows
Analyze traces for procedural failures, such as incorrect workflow routing or unsafe tool usage
Use AgentPex to automatically detect failures and validate AI agent behavior
Integrate AgentPex into existing testing and validation pipelines to improve overall system reliability

Who Needs to Know This

AI engineers and researchers benefit from AgentPex as it helps identify critical procedural failures in agentic traces, while product managers and DevOps teams can use it to improve the reliability of AI-powered systems

Key Insight

💡 Procedural failures in agentic traces can be automatically detected using AgentPex, improving the validation of AI agents in software systems