Together AI Optimizes ThunderKittens for NVIDIA Blackwell GPUs
Action Required
Developers can now accelerate their AI workloads on NVIDIA Blackwell GPUs using the ThunderKittens framework, leading to faster training and inference times.
AI Impact Summary
Together AI has optimized the ThunderKittens framework for NVIDIA Blackwell GPUs, leveraging new hardware features like fifth-generation tensor cores, Tensor Memory, and CTA pairs. This optimization allows for significantly faster GEMM and attention kernels, up to 2x faster than cuBLAS on H100 GPUs. This capability update enables developers to accelerate their AI workloads on Blackwell GPUs using the ThunderKittens framework, offering improved performance and efficiency.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- high