Lead, Site Reliability Engineer

Mastercard•O'fallon, MO

8d•$122,000 - $207,000•Onsite

About The Position

The ONE SRE team is looking for a Lead Site Reliability Engineer, to strengthen operational resilience across the technology stack through observability, automation, and platform engineering. The ideal candidate is highly motivated, intellectually curious, analytical, and passionate about improving reliability through proactive problem-solving, collaboration, and continuous improvement.

Requirements

Advanced expertise in Site Reliability Engineering, platform engineering, or infrastructure operations, with strong knowledge of F5 load balancer platforms in large-scale production environments
Strong experience implementing observability solutions across logs, metrics, and traces, and using telemetry to improve distributed system reliability and performance
Proficiency in Python, Go, Bash, or similar scripting languages to automate operational tasks, improve workflows, and enhance platform efficiency
Strong knowledge of Linux/Unix systems, networking, cloud and hybrid infrastructure, and highly available system design aligned to service level objectives
Hands-on experience with DevOps practices, including CI/CD pipelines, automation, and container-based deployments, along with strong troubleshooting and root cause analysis skills
Recognized as a technical expert who works independently on complex problems, influences outcomes across teams, and mentors others through shared engineering standards and best practices

Responsibilities

Serve as the subject matter expert for load balancer platforms, with a primary focus on F5 technologies, and improve platform reliability, scalability, and operability across the enterprise
Drive proactive reliability engineering by identifying systemic risks, recurring failure patterns, and architectural opportunities to strengthen resilience and performance
Lead observability improvements through telemetry, dashboards, alerting, and monitoring practices using tools such as Splunk and Dynatrace
Develop and enhance automation, CI/CD integrations, and DevOps practices that reduce manual effort and improve operational efficiency
Partner with Architecture, Load Balancer Engineering, Operations, and global SRE teams to influence standards, roadmaps, troubleshooting approaches, and end-to-end system design
Act as a senior escalation point for complex incidents, lead root cause analysis, and mentor engineers through shared documentation, runbooks, and best practices

Benefits

insurance (including medical, prescription drug, dental, vision, disability, life insurance)
flexible spending account and health savings account
16 weeks of new parent leave
up to 20 days of bereavement leave
80 hours of Paid Sick and Safe Time
25 days of vacation time
5 personal days
10 annual paid U.S. observed holidays
401k with a best-in-class company match
deferred compensation for eligible roles
fitness reimbursement or on-site fitness facilities
eligibility for tuition reimbursement