InfoCapability

Hugging Face Accelerate enables running large OPT/BLOOM models on RAM-constrained hardware via meta device and auto device maps

AI Impact Summary

Accelerate uses PyTorch's meta device to create empty shell models and an automated device map to allocate parts of each model across GPUs, CPU RAM, and disk offload, enabling inference for very large models without loading all weights in memory. This workflow supports OPT-6.7B, OPT-13B, and BLOOM on commodity hardware or notebooks, dramatically lowering infrastructure needs for experimentation. Teams must implement init_empty_weights, infer_auto_device_map, and no_split_module_classes handling to ensure correct layer placement and manage performance trade-offs from offloading.

Affected Systems

PyTorchHugging Face Accelerate

Date: Date not specified
Change type: capability
Severity: info

Hugging Face Accelerate enables running large OPT/BLOOM models on RAM-constrained hardware via meta device and auto device maps

More from Hugging Face

Get alerts for Hugging Face