RWKV: RNN-Transformer hybrid now supported in Hugging Face Transformers
AI Impact Summary
RWKV introduces a transformer-like attention-free RNN variant and is now integrated into Hugging Face Transformers, enabling deployment via familiar APIs and HF Hub. It supports very long context lengths (ctx8192), scalable models from 170M to 14B parameters, and a chat-optimized RWKV-4 Raven variant fine-tuned on ALPACA, CodeAlpaca, Guanaco, GPT4All, ShareGPT, among others, with performance improvements aided by tricks like TokenShift and SmallInitEmb. Technical teams should evaluate RWKV for long-context chat applications, plan to load via the Transformers integration (potentially from source or main branch), and account for tokenization/quantization considerations and existing fine-tuning pipelines.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info