HighCapability

Intel Introduces AutoRound: Advanced Quantization for LLMs and VLMs

Action Required

Organizations can deploy larger, more complex LLMs and VLMs on Intel hardware with improved performance and reduced resource requirements, enabling new use cases and reducing operational costs.

AI Impact Summary

Intel is introducing AutoRound, a new quantization tool designed to improve the efficiency of large language models (LLMs) and vision-language models (VLMs) by enabling low-bit quantization (INT2 - INT8) with minimal accuracy loss. This capability is particularly valuable for deploying these models on devices with limited resources, such as Intel A100 GPUs, and offers significant speed improvements compared to existing methods, with quantization taking just 37 minutes for a 72B model. This release represents a significant advancement in model optimization for Intel's hardware ecosystem.

Affected Systems

Intel A100 GPU

Date: Date not specified
Change type: capability
Severity: high

Intel Introduces AutoRound: Advanced Quantization for LLMs and VLMs

More from Hugging Face

Get alerts for Hugging Face