Google Cloud TPUs now supported on Hugging Face Inference Endpoints and Spaces (TPU v5e)
AI Impact Summary
HF now supports Google Cloud TPU v5e on Inference Endpoints and Spaces, enabling TPU-accelerated deployment of LLMs via Optimum TPU and Text Generation Inference. The offering provides three pod configurations (v5litepod-1/4/8) in us-west1 with explicit pricing and memory guidance to help limit memory budget issues for larger models. Supported deployable models include Gemma, Llama, and Mistral, giving teams a tangible path to lower latency and improved cost efficiency for large-model inference on HF platforms.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info