MediumCapability

Worst-Case Frontier Risks of Open Weight LLM - gpt-oss Malicious Fine-Tuning Study

AI Impact Summary

This research investigates the potential for significant harm arising from the release of open-weight LLMs, specifically focusing on the risk of malicious fine-tuning. The study simulates a scenario where an adversary aggressively fine-tunes gpt-oss, maximizing its capabilities in sensitive domains like biology and cybersecurity. The findings highlight the urgent need to understand and mitigate the risks associated with uncontrolled model development and deployment, particularly concerning capabilities that could be weaponized.

Affected Systems

gpt-oss

Business Impact

The potential for gpt-oss to be exploited through malicious fine-tuning poses a serious threat to data security, scientific integrity, and potentially national security.

Date: Date not specified
Change type: capability
Severity: medium

Worst-Case Frontier Risks of Open Weight LLM - gpt-oss Malicious Fine-Tuning Study

More from OpenAI

Get alerts for OpenAI