ONNX Runtime accelerates 130k+ Hugging Face models with latency gains, Whisper-tiny ~74% faster
AI Impact Summary
ONNX Runtime now supports accelerating a large library of Hugging Face models, with over 130,000 ONNX-enabled models available on Hugging Face Hub. The deployment can realize substantial latency improvements, exemplified by Whisper-tiny achieving about a 74.3% reduction versus PyTorch, and more than 90 architectures are supported (including BERT, GPT2, DistilBERT, RoBERTa, T5, Wav2Vec2, Stable-Diffusion, and Whisper). This broad compatibility suggests a viable path to boost inference throughput and reduce compute costs across production workloads that rely on these models.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info