InfoCapability

Run Vicuna 13B Chatbot on AMD GPU with ROCm and GPTQ

AI Impact Summary

This guide details running the Vicuna 13B language model, a 13 billion parameter chatbot, on a single AMD GPU using ROCm and GPTQ quantization. The key technical challenge is the model's memory footprint (approximately 28GB in fp16), which is addressed through 4-bit GPTQ quantization, reducing the memory requirements to around 7.5GB. This allows the model to run on GPUs with limited memory like the Instinct MI210 or RX6900XT, demonstrating a viable path to deploying large language models on consumer hardware. The process involves setting up ROCm, Docker, and Python, followed by model quantization and inference, ultimately exposing the model via a web API.

Affected Systems

ROCmGPTQ

Date: Date not specified
Change type: capability
Severity: info

Run Vicuna 13B Chatbot on AMD GPU with ROCm and GPTQ

More from Hugging Face

Get alerts for Hugging Face