InfoCapability

Anthropic's Constitutional AI with Open LLMs — self-critique for alignment

AI Impact Summary

Anthropic's Constitutional AI (CAI) technique offers a novel approach to aligning open large language models (LLMs) like Mistral 7B Instruct by iteratively critiquing and revising model outputs based on a user-defined constitution. This process, exemplified by prompting the model to identify and correct responses violating ethical principles, demonstrates a scalable method for reducing harmful outputs without requiring extensive human feedback. The development of the llm-swarm tool, leveraging Slurm clusters and TGI, further enhances this approach by enabling the generation of synthetic CAI datasets for training and experimentation.

Affected Systems

Mistral-7B-Instruct-v0.1Llama 2

Date: Date not specified
Change type: capability
Severity: info

Anthropic's Constitutional AI with Open LLMs — self-critique for alignment

More from Hugging Face

Get alerts for Hugging Face