AMD Releases Custom Kernels for MI300X GPU
Action Required
Organizations can accelerate LLM inference workloads on AMD MI300X GPUs through the use of these custom kernels, potentially reducing latency and improving performance.
AI Impact Summary
AMD is releasing custom kernels optimized for the MI300X GPU, specifically targeting Llama 3.1 405B inference in FP8. These kernels, including Fused residual connection, RMS norm and FP8 conversion kernel, and Skinny GEMM kernel, achieve significant speedups when using VLLM. This capability is enabled through collaboration with Hugging Face and focuses on improving performance for open-source communities, offering a path to accelerate LLM inference on AMD hardware.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- high