InfoCapability

Hugging Face NVIDIA NIM API (serverless) launches; legacy serverless inference deprecated

AI Impact Summary

The legacy Hugging Face serverless inference service has been deprecated and replaced by Hugging Face NVIDIA NIM API (serverless), available to Enterprise Hub organizations via the Hugging Face Hub. The new API runs inference on NVIDIA DGX Cloud hardware with a pay-as-you-go, per-second billing model and exposes an OpenAI API-compatible interface using fine-grained access tokens. It showcases models like meta-llama/Meta-Llama-3-8B-Instruct within the NVIDIA NIM collection and signals deeper integration with NVIDIA TensorRT-LLM and the Text Generation Inference framework for improved open-model inference performance.

Affected Systems

Hugging Face NVIDIA NIM API (serverless)Hugging Face Hub

Date: Date not specified
Change type: capability
Severity: info

Hugging Face NVIDIA NIM API (serverless) launches; legacy serverless inference deprecated

More from Hugging Face

Get alerts for Hugging Face