Hugging Face serverless GPU inference: old Cloudflare integration deprecated; migrate to Inference API, Endpoints, or Cloudflare Workers AI
AI Impact Summary
The old serverless GPU inference integration for Hugging Face is deprecated and replaced by two deployment paths: Hugging Face Inference API/Endpoints or the new Deploy on Cloudflare Workers AI. This shift affects how teams provision and call Open models, with edge-based GPUs and pay-per-use pricing offered by Cloudflare, or direct access via Hugging Face APIs for hosted deployments. The announcement highlights models like Hermes 2 Pro on Mistral 7B and other popular options (Llama, Gemma, Meta Llama 2 7B) and explains the setup flow using REST or the Cloudflare AI SDK, including required Cloudflare account and API token. Migration will impact tooling, credentials management, and potential latency/cost profiles depending on the chosen path.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info