Anthropic's Constitutional AI with Open LLMs — self-critique for alignment
AI Impact Summary
Anthropic's Constitutional AI (CAI) technique offers a novel approach to aligning open large language models (LLMs) like Mistral 7B Instruct by iteratively critiquing and revising model outputs based on a user-defined constitution. This process, exemplified by prompting the model to identify and correct responses violating ethical principles, demonstrates a scalable method for reducing harmful outputs without requiring extensive human feedback. The development of the llm-swarm tool, leveraging Slurm clusters and TGI, further enhances this approach by enabling the generation of synthetic CAI datasets for training and experimentation.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info