Hugging Face NVIDIA NIM API (serverless) launches; legacy serverless inference deprecated
AI Impact Summary
The legacy Hugging Face serverless inference service has been deprecated and replaced by Hugging Face NVIDIA NIM API (serverless), available to Enterprise Hub organizations via the Hugging Face Hub. The new API runs inference on NVIDIA DGX Cloud hardware with a pay-as-you-go, per-second billing model and exposes an OpenAI API-compatible interface using fine-grained access tokens. It showcases models like meta-llama/Meta-Llama-3-8B-Instruct within the NVIDIA NIM collection and signals deeper integration with NVIDIA TensorRT-LLM and the Text Generation Inference framework for improved open-model inference performance.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info