Lead Site Reliability Engineer

Alteryx•Irvine, CA

79d•$136,000 - $177,000

About The Position

Meet the Moment with Alteryx We're living through a once-in-a-generation shift in how work gets done. Data, automation, and AI are quickly becoming the center of every business decision - and Alteryx is leading the transformation. You'll be working on the challenges that sit at the heart of modern business. No matter your role, the work you do will help organizations move faster, see more clearly, and tackle questions that used to feel impossible. If you're ready to meet the moment with innovation, curiosity, and excellence, there's a place for you here. Why work for just any analytics company? At Alteryx, Inc., we are explorers, dreamers and innovators. We’re on a journey to build the best analytics platform in the world, but we can’t do it without people like you, leading the way. Forget the stereotypical tech companies of the past. Embrace the unconventional, exercise your imagination and help alter the future with Alteryx. We’re looking for a Lead SRE to own reliability outcomes for a modern split-plane, multi-region SaaS platform serving enterprise customers. This is a hands-on technical leadership role focused on system design, reliability strategy, and cross-team execution. You’ll lead efforts that directly impact SLO attainment, MTTR reduction, and cost efficiency, while shaping how reliability is engineered, measured, and scaled across the platform.

Requirements

6+ years leading delivery of complex, distributed systems or SaaS platforms
Strong experience with multi-region, split-plane architectures (control-plane / data-plane)
Proven track record improving SLOs, MTTR, and system reliability at scale
Proficiency in languages like Python, Java, C++, or JavaScript
Deep experience with:
Kubernetes (multi-cluster), CI/CD, and GitOps (ArgoCD)
SLO/SLA design, observability, and incident management
Infrastructure as Code and cloud platforms
Disaster recovery, resilience, and security best practices
Strong leadership skills with experience mentoring senior engineers and influencing cross-team decisions

Nice To Haves

Experience with chaos engineering and large-scale reliability automation
Background in enterprise SaaS platforms or split-plane architectures
Expertise in navigating, understanding and leveraging modern Observability platfroms (Datadog, Grafana, etc)

Responsibilities

Define and drive reliability strategy across control-plane and data-plane systems, including multi-region resilience, BCDR, and failover design
Establish and operationalize SLOs, SLAs, and error budgets, ensuring they inform planning and engineering tradeoffs
Lead initiatives that measurably improve MTTR, incident prevention, and overall service health
Own incident management end-to-end, driving systemic fixes and long-term reliability improvements beyond immediate response
Lead architecture and design reviews to ensure systems meet scalability, reliability, and cost efficiency goals
Champion automation and modernization, including AI-driven reliability improvements
Establish and enforce code quality and review standards
Lead cross-functional initiatives and align engineering with product priorities
Mentor senior engineers and act as a technical leader across teams

Benefits

Employees may also be eligible for a wide range of other benefits, such as a bonus or commission, medical, retirement, financial, wellness, time off, employee discounts, and others.
Alteryx has amazing benefits for all Associates which can be viewed here.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume