Lead Site Reliability Engineer

AlteryxIrvine, CA
$136,000 - $177,000

About The Position

Meet the Moment with Alteryx We're living through a once-in-a-generation shift in how work gets done. Data, automation, and AI are quickly becoming the center of every business decision - and Alteryx is leading the transformation. You'll be working on the challenges that sit at the heart of modern business. No matter your role, the work you do will help organizations move faster, see more clearly, and tackle questions that used to feel impossible. If you're ready to meet the moment with innovation, curiosity, and excellence, there's a place for you here. Why work for just any analytics company? At Alteryx, Inc., we are explorers, dreamers and innovators. We’re on a journey to build the best analytics platform in the world, but we can’t do it without people like you, leading the way. Forget the stereotypical tech companies of the past. Embrace the unconventional, exercise your imagination and help alter the future with Alteryx. We’re looking for a Lead SRE to own reliability outcomes for a modern split-plane, multi-region SaaS platform serving enterprise customers. This is a hands-on technical leadership role focused on system design, reliability strategy, and cross-team execution. You’ll lead efforts that directly impact SLO attainment, MTTR reduction, and cost efficiency, while shaping how reliability is engineered, measured, and scaled across the platform.

Requirements

  • 6+ years leading delivery of complex, distributed systems or SaaS platforms
  • Strong experience with multi-region, split-plane architectures (control-plane / data-plane)
  • Proven track record improving SLOs, MTTR, and system reliability at scale
  • Proficiency in languages like Python, Java, C++, or JavaScript
  • Deep experience with:
  • Kubernetes (multi-cluster), CI/CD, and GitOps (ArgoCD)
  • SLO/SLA design, observability, and incident management
  • Infrastructure as Code and cloud platforms
  • Disaster recovery, resilience, and security best practices
  • Strong leadership skills with experience mentoring senior engineers and influencing cross-team decisions

Nice To Haves

  • Experience with chaos engineering and large-scale reliability automation
  • Background in enterprise SaaS platforms or split-plane architectures
  • Expertise in navigating, understanding and leveraging modern Observability platfroms (Datadog, Grafana, etc)

Responsibilities

  • Define and drive reliability strategy across control-plane and data-plane systems, including multi-region resilience, BCDR, and failover design
  • Establish and operationalize SLOs, SLAs, and error budgets, ensuring they inform planning and engineering tradeoffs
  • Lead initiatives that measurably improve MTTR, incident prevention, and overall service health
  • Own incident management end-to-end, driving systemic fixes and long-term reliability improvements beyond immediate response
  • Lead architecture and design reviews to ensure systems meet scalability, reliability, and cost efficiency goals
  • Champion automation and modernization, including AI-driven reliability improvements
  • Establish and enforce code quality and review standards
  • Lead cross-functional initiatives and align engineering with product priorities
  • Mentor senior engineers and act as a technical leader across teams

Benefits

  • Employees may also be eligible for a wide range of other benefits, such as a bonus or commission, medical, retirement, financial, wellness, time off, employee discounts, and others.
  • Alteryx has amazing benefits for all Associates which can be viewed here.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service