MediumCapability

Fine-tuning GPT-2 774M with human feedback causes copying bias in summarization

AI Impact Summary

GPT-2 774M was fine-tuned with human preferences, and the model now tends to copy exact input sentences in summarization tasks rather than producing abstracted summaries. This stems from labelers prioritizing accuracy, creating a misalignment between external evaluation signals and the general objective of concise, standalone outputs. The 60k-label effort for summarization, versus 5k for other tasks, has biased the model toward verbatim content, elevating copyright, licensing, and privacy risks for downstream applications that ingest user-provided text.

Affected Systems

GPT-2 774M

Business Impact

Applications using GPT-2 774M for summarization may emit verbatim input text, creating copyright/privacy risks and reducing the usefulness of summaries.

Date: Date not specified
Change type: capability
Severity: medium

Fine-tuning GPT-2 774M with human feedback causes copying bias in summarization

More from OpenAI

Get alerts for OpenAI