InfoCapability

Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs

AI Impact Summary

Intel is introducing AutoRound, a new quantization tool designed to improve the efficiency of large language models (LLMs) and vision-language models (VLMs) by reducing model size and inference latency. AutoRound utilizes a weight-only post-training quantization method with signed gradient descent, achieving up to 2.1x higher relative accuracy at INT2 compared to existing baselines. This capability is particularly valuable for deploying models on resource-constrained environments.

Affected Systems

GPT-3.5 TurboOpenAI API

Date: Date not specified
Change type: capability
Severity: info

Introducing AutoRound: Intel’s Advanced Quantization for LLMs and VLMs

More from Hugging Face

Get alerts for Hugging Face