InfoCapability

Introducing Hugging Face LLM Inference Container for Amazon SageMaker — Deploy Open Assistant 12B

AI Impact Summary

Amazon SageMaker now offers the Hugging Face LLM Inference Container, enabling simplified deployment of open-source LLMs like BLOOM and Open Assistant models. This container leverages Text Generation Inference (TGI) for high-performance inference, utilizing Tensor Parallelism and dynamic batching. The example demonstrates deploying the Open Assistant 12B Pythia model on an `ml.g5.12xlarge` instance, showcasing the integration with the sagemaker python SDK and providing a Gradio chatbot interface for interaction.

Affected Systems

Hugging Face LLM Inference ContainerAmazon SageMaker

Date: Date not specified
Change type: capability
Severity: info

Introducing Hugging Face LLM Inference Container for Amazon SageMaker — Deploy Open Assistant 12B

More from Hugging Face

Get alerts for Hugging Face