IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

📰 Hugging Face Blog

IBM and UC Berkeley diagnose why enterprise agents fail using IT-Bench and MAST

advanced Published 18 Feb 2026
Action Steps
  1. Use IT-Bench to evaluate the performance of enterprise agents
  2. Analyze the results to identify failure modes and areas for improvement
  3. Apply MAST to diagnose and address the root causes of failure
  4. Compare the performance of different models, such as Gemini-3-Flash and Kimi-K2, to identify best practices
Who Needs to Know This

AI researchers and engineers on a team can benefit from understanding the limitations of enterprise agents, and how to diagnose and improve their performance using tools like IT-Bench and MAST.

Key Insight

💡 Enterprise agents can fail due to various reasons, and using tools like IT-Bench and MAST can help diagnose and improve their performance

Share This
💡 IBM & UC Berkeley diagnose enterprise agent failures using IT-Bench & MAST
Read full article → ← Back to News