Hugging Face π€ Evaluate adds toxicity, polarity, and HONEST bias metrics for GPT-2 and BLOOM
AI Impact Summary
Hugging Face expands π€ Evaluate with built-in bias metrics (toxicity, polarity via Regard, and HONEST) and demonstrates practical prompts using GPT-2 and BLOOM. The workflow leverages WinoBias and BOLD prompts and the R4 Target model as a classifier to quantify bias in model generations, showing that tiny prompt changes can flip toxicity and polarity signals. This enables engineering teams to integrate systematic bias measurement into model evaluation, but also highlights dataset and metric limitations that must be addressed for robust bias assessment in production.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info