THE ROLE: We are seeking a highly experienced Principal II, Site Reliability Engineer (SRE) to lead the strategy and execution of reliability engineering across Herbalife’s global platforms. This role focuses on building and scaling resilient, observable systems, advancing multi-cloud operations, and embedding reliability, automation, and guidelines across engineering teams. You will define standards, drive adoption of modern infrastructure practices, and ensure that our services deliver performance, availability, and reliability at scale. HOW YOU WOULD CONTRIBUTE: Architect resilient platforms and tooling across Azure and GCP, bringing to bear Kubernetes, serverless technologies, and infrastructure as code. Drive observability and monitoring practices with Dynatrace, Splunk, and OpenTelemetry, establishing metrics, tracing, alerting, and actionable dashboards. Design and implement GitOps workflows for consistent, auditable, and secure infrastructure and application deployments. Lead infrastructure automation with Terraform and related tooling to enable scalable, self-service provisioning and governance. Define and enforce SLOs, SLIs, and error budgets to measure and improve system reliability and customer experience. Develop operational standards and runbooks for incident response, disaster recovery, and performance management. Partner with application and infrastructure teams to ensure reliability, scalability, and cost-efficiency are built into every layer of the stack. Mentor and influence engineering teams to adopt modern SRE practices and drive a culture of operational excellence. WHAT’S SPECIAL ABOUT THE TEAM: The SRE team is evolving to expand its scope beyond traditional operations, embedding observability, automation, and cloud-native practices across Herbalife’s platform. Our mission is to ensure production systems are resilient, observable, and scalable, while enabling application teams to move quickly with confidence in Azure, GCP, and hybrid environments
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Principal
Number of Employees
5,001-10,000 employees