Hugging Face production alerts: NAT Gateway throughput, Hub Request Logs Archival Success Rate, and logging pipeline health
AI Impact Summary
Three Alerts are deployed to monitor egress and logging pipelines in Hugging Face's production stack. The NAT Gateway Throughput alert provides early warning of unusual outbound traffic and cost-impacting spikes, especially during refactors or when third-party security and autoscaling tooling is introduced. The Hub Request Logs Archival Success Rate alert covers the end-to-end log pipeline (Filebeat, Logstash, Elasticsearch, AWS Glue, Athena) to prevent data loss and ensure timely archival. The narrative also flags potential Elasticsearch backpressure risk under high ingress or shard movement, indicating the need to guard the logging path against cascading failures.
Affected Systems
- Date
- Date not specified
- Change type
- capability
- Severity
- info