InfoCapability

Hugging Face LLM Inference DLC for Amazon SageMaker enables OpenAssistant 12B deployment

AI Impact Summary

Hugging Face introduces a purpose-built LLM Inference Container (DLC) for Amazon SageMaker, powered by Text Generation Inference (TGI), to simplify deploying open-source LLMs at scale. The example walks through retrieving the DLC image URI, creating a HuggingFaceModel bound to a specific image, and deploying an OpenAssistant/pythia-12b-sft-v8-7k-steps model on a ml.g5.12xlarge endpoint with multi-GPU support and quantization options, including end-to-end inference and Gradio integration. This expands SageMaker’s capability to host high-throughput chat LLMs with explicit guidance on quotas and GPU sizing, enabling faster time-to-value for model deployments such as OpenAssistant and BLOOM-based workflows.

Affected Systems

Hugging Face LLM Inference DLCAmazon SageMaker

Date: Date not specified
Change type: capability
Severity: info

Hugging Face LLM Inference DLC for Amazon SageMaker enables OpenAssistant 12B deployment

More from Hugging Face

Get alerts for Hugging Face