InfoCapability

Optimum-Intel and OpenVINO GenAI enable edge deployment of Hugging Face Transformers

AI Impact Summary

Optimum-Intel combined with OpenVINO GenAI enables edge and client-side deployment of Hugging Face Transformers by exporting models to OpenVINO IR and running inference via Python or C++ paths, reducing dependency footprint. The workflow uses OVModelForCausalLM wrappers or CLI export to generate the IR directory (including tokenizers), laying the groundwork for lightweight deployment on Intel hardware. Weight-only quantization with NNCF (INT8/INT4) and techniques like AWQ and scale estimation are recommended to meet latency and footprint targets, with expected accuracy trade-offs validated against benchmarks such as lm-evaluation. Deployment is facilitated through the OpenVINO GenAI LLMPipeline in Python or the OpenVINO GenAI C++ API, enabling streamlined generation pipelines on CPU and other supported devices.

Affected Systems

Optimum-Intel

Date: Date not specified
Change type: capability
Severity: info

Optimum-Intel and OpenVINO GenAI enable edge deployment of Hugging Face Transformers

More from Hugging Face

Get alerts for Hugging Face