Deploy serverless sentiment-analysis transformer on Google Cloud with Cloud Run and PyTorch
AI Impact Summary
The author documents deploying a serverless sentiment-analysis pipeline on Google Cloud using a PyTorch/Transformers stack with distilbert-base-uncased-finetuned-sst-2-english. They evaluate multiple Google Cloud services (AI Platform Prediction, App Engine, then Cloud Run) and settle on a Dockerized Flask/Gunicorn deployment to run the model in a serverless environment, highlighting a practical path from traditional hosting to scalable inference. For a low-traffic use case (~2k requests/month), this approach trades some latency for ops simplicity, but careful memory sizing (4 GB) and warm-up strategies are needed to meet latency expectations and avoid cold-start penalties.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info