Introducing Hugging Face LLM Inference Container for Amazon SageMaker — Deploy Open Assistant 12B
AI Impact Summary
Amazon SageMaker now offers the Hugging Face LLM Inference Container, enabling simplified deployment of open-source LLMs like BLOOM and Open Assistant models. This container leverages Text Generation Inference (TGI) for high-performance inference, utilizing Tensor Parallelism and dynamic batching. The example demonstrates deploying the Open Assistant 12B Pythia model on an `ml.g5.12xlarge` instance, showcasing the integration with the sagemaker python SDK and providing a Gradio chatbot interface for interaction.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info