InfoCapability

Optimum Intel adds OpenVINO Runtime support and NNFC quantization for Transformer models

AI Impact Summary

Intel OpenVINO is now integrated into Optimum Intel, enabling OVRuntime-based inference for Transformer models on Intel hardware with the OVQuantizer and NNNCF tooling. The release exports OpenVINO models as XML/BIN and supports post-training static quantization for encoder models, with initial ViT benchmarks showing up to 3.8x memory reduction and ~2.4x lower latency. Quantization requires a calibration dataset and a warmup/compilation step, and Encoder-Decoder quantization remains disabled until the next OpenVINO release. Hosting can be done from the Hugging Face hub or locally, broadening deployment options for edge and data center environments.

Affected Systems

Optimum IntelOpenVINO

Date: Date not specified
Change type: capability
Severity: info

Optimum Intel adds OpenVINO Runtime support and NNFC quantization for Transformer models

More from Hugging Face

Get alerts for Hugging Face