BridgeTower fine-tuning on Habana Gaudi2 with Optimum Habana v1.7 — up to 2.5x faster vs A100, 1.4x vs H100
AI Impact Summary
BridgeTower fine-tuning on Habana Gaudi2 using Optimum Habana v1.7 shows substantial throughput gains over Nvidia GPUs, driven by hardware-accelerated data loading and the Gaudi2 architecture. In a 866M-parameter BridgeTower setup, Gaudi2 achieved up to 1.28–1.79x throughput over H100 and up to 2.64x over A100 depending on dataloader_num_workers, with absolute throughputs ranging from about 600 to 770 samples/s. This confirms that enabling device-side data loading (media pipeline) and tuning dataloader_num_workers is a critical lever for VL workloads and should factor into capacity planning and vendor choice.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info