Join the OS/Platform team as a Site Reliability Engineer (SRE) and keep our services healthy, observable, and fast. Partnering with the Platform Engineering group, you’ll own the day‑to‑day operation of our monitoring stack—Grafana, Prometheus, Loki, and Tempo—crafting dashboards that surface golden signals and drive real‑time insight. You’ll codify reliability through SLIs/SLOs, automate runbooks in Python, and lead incident response to maintain world‑class uptime across both on‑prem and AWS environments.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
Bachelor's degree
Number of Employees
251-500 employees