Optimum Intel adds OpenVINO Runtime support and NNFC quantization for Transformer models
AI Impact Summary
Intel OpenVINO is now integrated into Optimum Intel, enabling OVRuntime-based inference for Transformer models on Intel hardware with the OVQuantizer and NNNCF tooling. The release exports OpenVINO models as XML/BIN and supports post-training static quantization for encoder models, with initial ViT benchmarks showing up to 3.8x memory reduction and ~2.4x lower latency. Quantization requires a calibration dataset and a warmup/compilation step, and Encoder-Decoder quantization remains disabled until the next OpenVINO release. Hosting can be done from the Hugging Face hub or locally, broadening deployment options for edge and data center environments.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info