Nyströmformer enables linear-time self-attention in HuggingFace Transformers via Nyström landmarks
AI Impact Summary
Nyströmformer replaces the standard O(n^2) self-attention with a Nyström-based approximation that derives query and key landmarks from segment means to build an efficient three-matrix representation for attention. This reduces complexity to O(n) and enables training and inference on longer sequences (e.g., 4k–8k) with competitive performance, as demonstrated in the HuggingFace implementation using PyTorch code samples. Expect attention-path rewrites around landmark computation and potential accuracy/latency tradeoffs based on the number of landmarks (32–64) and segmentation strategy.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info