Worst-Case Frontier Risks of Open Weight LLM - gpt-oss Malicious Fine-Tuning Study
AI Impact Summary
This research investigates the potential for significant harm arising from the release of open-weight LLMs, specifically focusing on the risk of malicious fine-tuning. The study simulates a scenario where an adversary aggressively fine-tunes gpt-oss, maximizing its capabilities in sensitive domains like biology and cybersecurity. The findings highlight the urgent need to understand and mitigate the risks associated with uncontrolled model development and deployment, particularly concerning capabilities that could be weaponized.
Affected Systems
Business Impact
The potential for gpt-oss to be exploited through malicious fine-tuning poses a serious threat to data security, scientific integrity, and potentially national security.
- Date
- Date not specified
- Change type
- capability
- Severity
- medium