IBM and UC Berkeley Diagnose Why Enterprise Agents Fail Using IT-Bench and MAST

📰 Hugging Face Blog

IBM and UC Berkeley diagnose why enterprise agents fail using IT-Bench and MAST

advanced Published 18 Feb 2026

Action Steps

Use IT-Bench to evaluate the performance of enterprise agents
Analyze the results to identify failure modes and areas for improvement
Apply MAST to diagnose and address the root causes of failure
Compare the performance of different models, such as Gemini-3-Flash and Kimi-K2, to identify best practices

Who Needs to Know This

AI researchers and engineers on a team can benefit from understanding the limitations of enterprise agents, and how to diagnose and improve their performance using tools like IT-Bench and MAST.

Key Insight

💡 Enterprise agents can fail due to various reasons, and using tools like IT-Bench and MAST can help diagnose and improve their performance