SWE-bench Verified benchmark deprecated — migrate to SWE-bench Pro
AI Impact Summary
The SWE-bench Verified benchmark is becoming unreliable due to contamination and leakage of training data, leading to inaccurate measurements of software engineering progress. This undermines the validity of comparing model performance and hinders effective development strategies. The shift to SWE-bench Pro offers a more robust and trustworthy evaluation framework, mitigating the risks associated with the compromised Verified dataset.
Affected Systems
Business Impact
Reliance on the deprecated SWE-bench Verified benchmark will result in inaccurate performance assessments and potentially flawed engineering decisions.
- Date
- Date not specified
- Change type
- capability
- Severity
- medium