InfoCapability

Hugging Face serverless GPU inference: old Cloudflare integration deprecated; migrate to Inference API, Endpoints, or Cloudflare Workers AI

AI Impact Summary

The old serverless GPU inference integration for Hugging Face is deprecated and replaced by two deployment paths: Hugging Face Inference API/Endpoints or the new Deploy on Cloudflare Workers AI. This shift affects how teams provision and call Open models, with edge-based GPUs and pay-per-use pricing offered by Cloudflare, or direct access via Hugging Face APIs for hosted deployments. The announcement highlights models like Hermes 2 Pro on Mistral 7B and other popular options (Llama, Gemma, Meta Llama 2 7B) and explains the setup flow using REST or the Cloudflare AI SDK, including required Cloudflare account and API token. Migration will impact tooling, credentials management, and potential latency/cost profiles depending on the chosen path.

Affected Systems

Hugging Face Hub

Date: Date not specified
Change type: capability
Severity: info

Hugging Face serverless GPU inference: old Cloudflare integration deprecated; migrate to Inference API, Endpoints, or Cloudflare Workers AI

More from Hugging Face

Get alerts for Hugging Face