Sr Lead Site Reliability & Systems Engineer

Cox EnterprisesAustin, TX
$163,400 - $272,300Hybrid

About The Position

We are seeking a Senior Lead Site Reliability & Systems Engineer — a versatile technical leader who combines deep SRE expertise with broad systems engineering capability. In this hybrid role you will drive platform reliability, operational excellence, and systems architecture across our infrastructure, ensuring our products are scalable, resilient, and delivered with high velocity. You will partner with engineering, product, and operations teams to embed reliability and sound systems design at every layer of the stack.

Requirements

  • 8+ years of experience in SRE, systems engineering, platform engineering, or DevOps roles
  • 3+ years in a senior or lead capacity with ownership of large-scale, distributed systems
  • Deep expertise in at least one major cloud provider — AWS preferred
  • Strong proficiency in Python, Go, Bash, Java, or C++
  • Hands-on experience with Kubernetes, container orchestration, and service mesh technologies
  • Solid understanding of Linux/Unix internals, networking (TCP/IP, DNS, TLS/SSL, load balancing)
  • Proficiency with observability tooling: Datadog, Prometheus/Grafana, Splunk, or equivalent
  • Proven track record defining and operating against SLOs and error budgets
  • Experience with infrastructure-as-code tools — Terraform required
  • Strong understanding of distributed systems design, security fundamentals, and data governance

Nice To Haves

  • Experience with service mesh (Istio, Linkerd) and API gateways (Kong, Apigee)
  • Background in systems integration across enterprise middleware, ERP, or CRM platforms
  • Familiarity with FinOps practices and cloud cost optimization
  • Experience in regulated industries: financial services, automotive, healthcare, or government
  • Familiarity with compliance frameworks: SOC 2, ISO 27001, or NIST
  • Track record of leading migrations — legacy-to-cloud or monolith-to-microservices
  • Relevant certifications: AWS Solutions Architect, CKA/CKAD, GCP Professional, or Red Hat RHCA

Responsibilities

  • Define and drive the SRE strategy, roadmap, and standards across engineering teams
  • Establish and enforce SLOs, SLIs, and error budgets across all production services
  • Own the incident management lifecycle — detection, response, resolution, and prevention
  • Lead blameless postmortems and translate findings into lasting systemic improvements
  • Manage on-call rotations and aggressively reduce toil through automation
  • Lead the design and evolution of large-scale, distributed systems and platform infrastructure
  • Define technical standards, architectural patterns, and engineering best practices org-wide
  • Evaluate and recommend technologies and tooling aligned to business and reliability requirements
  • Conduct architecture reviews and provide guidance on complex technical trade-offs
  • Lead capacity planning, performance engineering, and infrastructure scaling strategies
  • Build and maintain highly available, fault-tolerant infrastructure on cloud platforms (AWS/GCP/Azure)
  • Drive infrastructure-as-code adoption (Terraform) and enforce best practices
  • Architect and implement observability platforms — metrics, logging, tracing, and alerting
  • Build and improve CI/CD pipelines, deployment automation, and release engineering workflows
  • Lead chaos engineering and game day exercises to validate system resilience
  • Champion automation across provisioning, testing, deployment, and monitoring workflows
  • Mentor and grow a team of SREs, platform engineers, and systems engineers
  • Partner with DevOps, security, and product teams to align on shared platform goals
  • Serve as the technical escalation point for critical infrastructure incidents and outages
  • Communicate complex technical concepts clearly to non-technical stakeholders and leadership
  • Contribute to build vs. buy evaluations and drive strategic vendor assessments

Benefits

  • Competitive base salary + annual bonus
  • Comprehensive health, dental, and vision coverage
  • 401(k) with company match
  • Generous PTO and paid parental leave
  • Flexible hybrid work model
  • Learning & development budget (conferences, certs, courses)
  • Engineering-first culture with direct product impact
  • Collaborative teams and transparent leadership
  • Flexible vacation with pay
  • Seven paid holidays
  • Up to 160 hours of paid wellness annually
  • Bereavement leave
  • Time off to vote
  • Jury duty leave
  • Volunteer time off
  • Military leave
  • Parental leave
  • Health care insurance (medical, dental, vision)
  • Retirement planning (401(k))
  • Paid days off (sick leave, parental leave, flexible vacation/wellness days, and/or PTO)
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service