InfoCapability

Custom ROCm kernels for AMD MI300X to accelerate VLLM with Llama 3.1 405B FP8

AI Impact Summary

AMD MI300X-specific kernels are being developed to boost inference performance, including fused residual/RMS norm, FP8 conversion, fused SwiGLU, and a skinny GEMM kernel, targeting Llama 3.1 405B in FP8 on 8-MI300X nodes with VLLM. The work is published in the hf-rocm-kernels repo and planned for integration into the AMD fork of VLLM, with Python bindings and benchmarking scripts to reproduce results. This unlocks kernel-level optimizations for AMD hardware, offering a clear path to higher throughput and lower latency for large-scale generative workloads, contingent on adopting the provided repo and follow-on integration steps.

Affected Systems

AMD MI300XVLLM

Date: Date not specified
Change type: capability
Severity: info

Custom ROCm kernels for AMD MI300X to accelerate VLLM with Llama 3.1 405B FP8

More from Hugging Face

Get alerts for Hugging Face