TransformerEngine v1.11.0 adds MXFP8, MPS FP16/BF16, and Python 3.10 support
Action Required
This release enables higher-throughput FP8/FP16 training across Nvidia and Apple Silicon environments, but teams must upgrade PyTorch, set fp8_config accordingly, and adjust accelerate/FSDP settings to realize the performance and capability gains.
AI Impact Summary
TransformerEngine now supports MXFP8 via the MXFP8/block scaling path, requiring enabling use_mxfp8_block_scaling in fp8_config to unlock MXFP8 arithmetic and improve FP8 training throughput. FP16/BF16 support on MPS devices expands Mac-based training, with FP16 requiring PyTorch 2.8 and BF16 PyTorch 2.6, and AMP for MPS updates. FSDP v2 gains (ignored_params and no_sync) simplify distributed training and mixed-precision policy can be passed as a string through accelerate CLI or fsdp_config, while Python 3.10 support aligns with modern runtimes.
Affected Systems
- Date
- Date not specified
- Change type
- deprecation
- Severity
- high