Optimum + ONNX Runtime Training accelerates Hugging Face model training (up to 130% with DeepSpeed)
AI Impact Summary
Hugging Face Optimum now integrates ONNX Runtime Training to accelerate fine-tuning of large language, speech, and vision models, delivering 35%+ speedups and up to 130% when combined with DeepSpeed ZeRO-1. The gains stem from memory and compute optimizations (memory planning, kernel optimizations, multi-tensor apply for Adam, FP16/mixed precision, graph fusions) and through the ORTTrainer/ORTTrainingArguments API, enabling seamless composition with DeepSpeed and easier hardware utilization across NVIDIA and AMD GPUs. For teams, this provides faster training cycles with a clear migration path from Trainer to ORTTrainer and from TrainingArguments to ORTTrainingArguments, plus straightforward export to ONNX after training.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info