Willful Disobedience: Automatically Detecting Failures in Agentic Traces

📰 ArXiv cs.AI

AgentPex automatically detects failures in agentic traces, improving validation of AI agents in software systems

advanced Published 26 Mar 2026
Action Steps
  1. Collect agentic traces from AI agents executing multi-step workflows
  2. Analyze traces for procedural failures, such as incorrect workflow routing or unsafe tool usage
  3. Use AgentPex to automatically detect failures and validate AI agent behavior
  4. Integrate AgentPex into existing testing and validation pipelines to improve overall system reliability
Who Needs to Know This

AI engineers and researchers benefit from AgentPex as it helps identify critical procedural failures in agentic traces, while product managers and DevOps teams can use it to improve the reliability of AI-powered systems

Key Insight

💡 Procedural failures in agentic traces can be automatically detected using AgentPex, improving the validation of AI agents in software systems

Share This
🚨 Automatically detect failures in AI agent execution histories with AgentPex 💡
Read full paper → ← Back to News