Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs
AI Impact Summary
Intel is introducing AutoRound, a new quantization tool designed to improve the efficiency of large language models (LLMs) and vision-language models (VLMs) by reducing model size and inference latency. AutoRound utilizes a weight-only post-training quantization method with signed gradient descent, achieving up to 2.1x higher relative accuracy at INT2 compared to existing baselines. This capability is particularly valuable for deploying models on resource-constrained environments.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info