The Site Reliability Operations (SRO) Manager leads the team responsible for end-to-end observability, real-time monitoring, and operational response across Mercury’s production and non-production platforms. This role centers on proactive detection of issues, live support during releases, and structured incident and problem management to minimize customer impact and drive long-term stability. The SRO Manager ensures that services are well-instrumented (metrics, logs, traces, and dashboards), that alerts are actionable and tuned, and that root cause analysis (RCA) and follow-through on corrective actions are consistently executed. The SRO Manager partners closely with application development, DevOps COE, Site Reliability Engineering (SRE), and Infrastructure teams to build release and runtime practices that are observable by design, provide real-time operational support during deployments, and use data-driven insights and automation to continuously improve system resilience, change success rates, and time to recovery.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Manager