Text Generation Inference adds Intel Gaudi backend via multi-backend integration
AI Impact Summary
Intel Gaudi hardware is now natively supported in Text Generation Inference through a unified multi-backend integration, eliminating the separate tgi-gaudi fork (PR #3091). This expands production deployment options beyond GPUs, with Gaudi-specific features like multi-card inference and FP8 precision potentially improving throughput and cost-per-token for the listed models. The update covers a broad model lineup including Llama 3.1/3.3/3.2 Vision, Mistral, Mixtral, CodeLlama, Falcon 180B, Qwen2, Starcoder, Gemma, Llava-v1.6-Mistral-7B, and Phi-2, and provides a Docker-based getting-started path via the official Gaudi-enabled image. Teams should test their existing inference endpoints on Gaudi, verify compatibility with FP8 paths, and adjust deployment pipelines to use the Gaudi backend.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info