InfoCapability

BigBird block-sparse attention enables 4096-token sequences in 🤗Transformers BigBird RoBERTa-like model

AI Impact Summary

BigBird replaces full attention with a block-sparse scheme that combines sliding, global, and random attention to approximate BERT-style full attention. This reduces quadratic memory/compute, enabling transformer models to handle sequences up to around 4096 tokens. In practice, 🤗Transformers now offers a BigBird RoBERTa-like model, so teams can experiment with longer-context NLP tasks (long-document QA, summarization) with lower resource footprints; be aware that the attention pattern is an approximation and may yield differences in token interactions compared to full attention.

Affected Systems

BigBird🤗Transformers

Date: Date not specified
Change type: capability
Severity: info

BigBird block-sparse attention enables 4096-token sequences in 🤗Transformers BigBird RoBERTa-like model

More from Hugging Face

Get alerts for Hugging Face