Anthropic detects and prevents AI labs from distilling Claude's capabilities
Action Required
The proliferation of unprotected AI models, enabled by distillation attacks, could compromise national security and undermine Anthropic's ability to maintain a competitive advantage.
AI Impact Summary
Anthropic is proactively addressing a significant security threat: industrial-scale distillation attacks by AI labs seeking to illicitly extract capabilities from Claude. Three labs (DeepSeek, Moonshot, and MiniMax) have been identified engaging in these attacks, generating millions of exchanges to train less capable models. This poses a national security risk due to the potential for unprotected, dangerous capabilities to proliferate, particularly through open-sourced distilled models.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- critical