InfoCapability

ONNX Runtime accelerates 130k+ Hugging Face models with latency gains, Whisper-tiny ~74% faster

AI Impact Summary

ONNX Runtime now supports accelerating a large library of Hugging Face models, with over 130,000 ONNX-enabled models available on Hugging Face Hub. The deployment can realize substantial latency improvements, exemplified by Whisper-tiny achieving about a 74.3% reduction versus PyTorch, and more than 90 architectures are supported (including BERT, GPT2, DistilBERT, RoBERTa, T5, Wav2Vec2, Stable-Diffusion, and Whisper). This broad compatibility suggests a viable path to boost inference throughput and reduce compute costs across production workloads that rely on these models.

Affected Systems

ONNX RuntimeHugging Face Hub

Date: Date not specified
Change type: capability
Severity: info

ONNX Runtime accelerates 130k+ Hugging Face models with latency gains, Whisper-tiny ~74% faster

More from Hugging Face

Get alerts for Hugging Face