IBM and UC Berkeley Diagnose Enterprise Agent Failures with ITBench and MAST
Action Required
Organizations relying on agentic systems for IT automation need to understand and mitigate the identified failure modes to ensure reliable performance and avoid costly downtime.
AI Impact Summary
IBM and UC Berkeley have identified key failure modes in enterprise agent systems using the ITBench benchmark and the MAST taxonomy. This research reveals that larger, open models like GPT-OSS-120B exhibit cascading failure patterns, while frontier models like Gemini-3-Flash demonstrate more isolated and predictable failures. Understanding these failure modes – particularly the prevalence of 'Non-Fatal' flaws like incorrect verification and premature termination – is crucial for building more robust and reliable agentic systems for IT automation tasks.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- high