HighCapability

IBM and UC Berkeley Diagnose Enterprise Agent Failures with ITBench and MAST

Action Required

Organizations relying on agentic systems for IT automation need to understand and mitigate the identified failure modes to ensure reliable performance and avoid costly downtime.

AI Impact Summary

IBM and UC Berkeley have identified key failure modes in enterprise agent systems using the ITBench benchmark and the MAST taxonomy. This research reveals that larger, open models like GPT-OSS-120B exhibit cascading failure patterns, while frontier models like Gemini-3-Flash demonstrate more isolated and predictable failures. Understanding these failure modes – particularly the prevalence of 'Non-Fatal' flaws like incorrect verification and premature termination – is crucial for building more robust and reliable agentic systems for IT automation tasks.

Affected Systems

GPT-3.5 Turbo

Date: Date not specified
Change type: capability
Severity: high

IBM and UC Berkeley Diagnose Enterprise Agent Failures with ITBench and MAST

More from Hugging Face

Get alerts for Hugging Face