InfoCapability

Foundation models' data labeling capabilities vs human annotators: Vicuna-13B, Koala-13B, Oasst-12B, Dolly-12B benchmarks

AI Impact Summary

Foundation models are being explored for generating human-like data labels, a capability central to RLHF and benchmark-driven evaluation. The content compares multiple open-source models (Vicuna-13B, Koala-13B, Oasst-12B, Dolly-12B) and describes human annotations from Scale AI and GPT-4 evals, highlighting that labeling quality hinges on both model prompts and evaluator setup. For technical teams, this underscores the need to anchor automated labeling pipelines to diverse, human-derived benchmarks (e.g., Anthropic HHH data, OpenAssistant rankings) to prevent bias and miscalibration in reward models.

Affected Systems

Vicuna-13BKoala-13B

Date: Date not specified
Change type: capability
Severity: info

Foundation models' data labeling capabilities vs human annotators: Vicuna-13B, Koala-13B, Oasst-12B, Dolly-12B benchmarks

More from Hugging Face

Get alerts for Hugging Face