Lead, Site Reliability Engineer

MastercardO'fallon, MO
$122,000 - $207,000Onsite

About The Position

The ONE SRE team is looking for a Lead Site Reliability Engineer, to strengthen operational resilience across the technology stack through observability, automation, and platform engineering. The ideal candidate is highly motivated, intellectually curious, analytical, and passionate about improving reliability through proactive problem-solving, collaboration, and continuous improvement.

Requirements

  • Advanced expertise in Site Reliability Engineering, platform engineering, or infrastructure operations, with strong knowledge of F5 load balancer platforms in large-scale production environments
  • Strong experience implementing observability solutions across logs, metrics, and traces, and using telemetry to improve distributed system reliability and performance
  • Proficiency in Python, Go, Bash, or similar scripting languages to automate operational tasks, improve workflows, and enhance platform efficiency
  • Strong knowledge of Linux/Unix systems, networking, cloud and hybrid infrastructure, and highly available system design aligned to service level objectives
  • Hands-on experience with DevOps practices, including CI/CD pipelines, automation, and container-based deployments, along with strong troubleshooting and root cause analysis skills
  • Recognized as a technical expert who works independently on complex problems, influences outcomes across teams, and mentors others through shared engineering standards and best practices

Responsibilities

  • Serve as the subject matter expert for load balancer platforms, with a primary focus on F5 technologies, and improve platform reliability, scalability, and operability across the enterprise
  • Drive proactive reliability engineering by identifying systemic risks, recurring failure patterns, and architectural opportunities to strengthen resilience and performance
  • Lead observability improvements through telemetry, dashboards, alerting, and monitoring practices using tools such as Splunk and Dynatrace
  • Develop and enhance automation, CI/CD integrations, and DevOps practices that reduce manual effort and improve operational efficiency
  • Partner with Architecture, Load Balancer Engineering, Operations, and global SRE teams to influence standards, roadmaps, troubleshooting approaches, and end-to-end system design
  • Act as a senior escalation point for complex incidents, lead root cause analysis, and mentor engineers through shared documentation, runbooks, and best practices

Benefits

  • insurance (including medical, prescription drug, dental, vision, disability, life insurance)
  • flexible spending account and health savings account
  • 16 weeks of new parent leave
  • up to 20 days of bereavement leave
  • 80 hours of Paid Sick and Safe Time
  • 25 days of vacation time
  • 5 personal days
  • 10 annual paid U.S. observed holidays
  • 401k with a best-in-class company match
  • deferred compensation for eligible roles
  • fitness reimbursement or on-site fitness facilities
  • eligibility for tuition reimbursement
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service