Principal Site Reliability Engineer

Zscaler•San Jose, CA

19h•Hybrid

About The Position

We are looking for a Principal Site Reliability Engineer to join our Infrastructure services and architecture team. This role is hybrid 3 days a week onsite in San Jose, CA, or can be remote reporting to the Senior Manager, Cloud Operations. As a lead engineer, you will leverage deep expertise in IaC/CaC, Linux virtualization, and physical hardware management to drive our infrastructure forward. You will oversee networking services and general software engineering to ensure our systems remain scalable, resilient, and high-performing.

Requirements

Expert-level proficiency with Kubernetes
Deep professional experience with Terraform and Ansible
Expert-level programming skills in Python or Go
Hands-on experience with Enterprise Linux distributions such as Rocky, Red Hat, or Alma
Proven experience using Git within a structured SDLC
U.S. citizenship due to the nature of the customers assigned to this role

Nice To Haves

Deep knowledge of Linux Hypervisors, including OpenStack, Proxmox, libvirt, or QEMU
Technical experience working with FreeBSD
Familiarity with Identity Access Management tools like HashiCorp Vault, LDAP, or OIDC

Responsibilities

Mentor junior engineers and lead high-impact infrastructure projects
Support business operations by responding to alerts and triaging systems to restore mission-critical capabilities
Ensure the reliability and performance of all customer-facing services
Design, document, and build technical solutions that align with evolving organizational needs
Support Agile processes to maintain high velocity and a collaborative environment