Anthropic partners with NNSA to deploy AI classifier for nuclear proliferation risk detection
AI Impact Summary
Anthropic is partnering with the U.S. Department of Energy’s NNSA to develop AI-powered safeguards against the misuse of its Claude models for nuclear proliferation. The core of this effort is a new classifier that automatically identifies concerning nuclear-related conversations with 96% accuracy, deployed initially on Claude traffic. This proactive approach, combined with ongoing risk assessments, represents a critical step in mitigating potential national security threats posed by increasingly capable AI models, and establishes a model for other AI developers to follow.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- medium