InfoCapability

Nyströmformer enables linear-time self-attention in HuggingFace Transformers via Nyström landmarks

AI Impact Summary

Nyströmformer replaces the standard O(n^2) self-attention with a Nyström-based approximation that derives query and key landmarks from segment means to build an efficient three-matrix representation for attention. This reduces complexity to O(n) and enables training and inference on longer sequences (e.g., 4k–8k) with competitive performance, as demonstrated in the HuggingFace implementation using PyTorch code samples. Expect attention-path rewrites around landmark computation and potential accuracy/latency tradeoffs based on the number of landmarks (32–64) and segmentation strategy.

Affected Systems

NyströmformerHuggingFace Transformers

Date: Date not specified
Change type: capability
Severity: info

Nyströmformer enables linear-time self-attention in HuggingFace Transformers via Nyström landmarks

More from Hugging Face

Get alerts for Hugging Face