Hugging Face: Custom ROCm kernels for AMD MI300X to accelerate VLLM with Llama 3.1 405B FP8 | SignalBreak | SignalBreak