InfoCapability

Text Generation Inference adds Intel Gaudi backend via multi-backend integration

AI Impact Summary

Intel Gaudi hardware is now natively supported in Text Generation Inference through a unified multi-backend integration, eliminating the separate tgi-gaudi fork (PR #3091). This expands production deployment options beyond GPUs, with Gaudi-specific features like multi-card inference and FP8 precision potentially improving throughput and cost-per-token for the listed models. The update covers a broad model lineup including Llama 3.1/3.3/3.2 Vision, Mistral, Mixtral, CodeLlama, Falcon 180B, Qwen2, Starcoder, Gemma, Llava-v1.6-Mistral-7B, and Phi-2, and provides a Docker-based getting-started path via the official Gaudi-enabled image. Teams should test their existing inference endpoints on Gaudi, verify compatibility with FP8 paths, and adjust deployment pipelines to use the Gaudi backend.

Affected Systems

Text Generation Inference (TGI)

Date: Date not specified
Change type: capability
Severity: info

Text Generation Inference adds Intel Gaudi backend via multi-backend integration

More from Hugging Face

Get alerts for Hugging Face