InfoCapability

Optimizing Stable Diffusion on Intel CPUs with NNCF and OpenVINO (ToME, QAT)

AI Impact Summary

Intel-focused optimization of Stable Diffusion via OpenVINO, NNCF, and Diffusers demonstrates that CPU-bound inference can achieve GPU-like performance when a layered optimization stack is used. Converting to OpenVINO FP32 yields ~1.9x latency improvement over PyTorch, 8-bit quantization adds ~3.9x, and stacking Token Merging (ToME) with quantization yields ~5.1x faster inference while maintaining the same footprint. A key caveat is that post-training 8-bit quantization alone does not preserve accuracy for Stable Diffusion; QAT with EMA and knowledge distillation is required to recover quality, and Token Merging must be adapted for OpenVINO. This workflow makes CPU-only Stable Diffusion viable on edge devices and CPU servers, enabling cheaper deployments with lower memory usage and faster prompt-to-image turnaround.

Affected Systems

Stable Diffusion UNet

Date: Date not specified
Change type: capability
Severity: info

Optimizing Stable Diffusion on Intel CPUs with NNCF and OpenVINO (ToME, QAT)

More from Hugging Face

Get alerts for Hugging Face