Hugging Face Accelerate enables running large OPT/BLOOM models on RAM-constrained hardware via meta device and auto device maps
AI Impact Summary
Accelerate uses PyTorch's meta device to create empty shell models and an automated device map to allocate parts of each model across GPUs, CPU RAM, and disk offload, enabling inference for very large models without loading all weights in memory. This workflow supports OPT-6.7B, OPT-13B, and BLOOM on commodity hardware or notebooks, dramatically lowering infrastructure needs for experimentation. Teams must implement init_empty_weights, infer_auto_device_map, and no_split_module_classes handling to ensure correct layer placement and manage performance trade-offs from offloading.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info