HighCapability

AMD Releases Custom Kernels for MI300X GPU

Action Required

Organizations can accelerate LLM inference workloads on AMD MI300X GPUs through the use of these custom kernels, potentially reducing latency and improving performance.

AI Impact Summary

AMD is releasing custom kernels optimized for the MI300X GPU, specifically targeting Llama 3.1 405B inference in FP8. These kernels, including Fused residual connection, RMS norm and FP8 conversion kernel, and Skinny GEMM kernel, achieve significant speedups when using VLLM. This capability is enabled through collaboration with Hugging Face and focuses on improving performance for open-source communities, offering a path to accelerate LLM inference on AMD hardware.

Affected Systems

AMD MI300X

Date: Date not specified
Change type: capability
Severity: high

AMD Releases Custom Kernels for MI300X GPU

More from Hugging Face

Get alerts for Hugging Face