Join the OS/Platform team as a Site Reliability Engineer (SRE) and keep our services healthy, observable, and fast. Partnering with the Platform Engineering group, you'll own the day-to-day operation of our monitoring stack-Grafana, Prometheus, Loki, and Tempo-crafting dashboards that surface golden signals and drive real-time insight. You'll codify reliability through SLIs/SLOs, automate runbooks in Python, and lead incident response to maintain world-class uptime across both on-prem and AWS environments.