Staff Site Reliability Engineer

Zscaler•San Jose, CA

7d•$119,000 - $170,000•Hybrid

About The Position

Zscaler accelerates digital transformation to ensure our customers can be more agile, efficient, resilient, and secure. As an AI-forward enterprise, we are constantly pushing the envelope, leveraging the world’s largest security data lake to power our cloud-native Zero Trust Exchange platform. This innovation protects our customers from cyberattacks and data loss by securely connecting users, devices, and applications in any location. Here, impact in your role matters more than title and trust is built on results. We say, impact over activity. We seek innovators who actively use AI to amplify their impact and who thrive in an environment where we leverage intelligent systems to stay ahead of evolving threats. We believe in transparency and value constructive, honest debate—we’re focused on getting to the best ideas, faster. We build high-performing teams that can make an impact quickly and with high quality. To do this, we are building a culture of execution centered on customer obsession, collaboration, ownership, and accountability. We value high-impact, high-accountability with a sense of urgency where you’re enabled to do your best work and embrace your potential. If you’re driven by purpose, thrive on solving complex challenges, and want to be part of the team that’s helping to secure the AI age, we invite you to bring your talents to Zscaler and help shape the future of cybersecurity. Role We are looking for a Staff Site Reliability Engineer (Automation) to join our Engineering team. This is a hybrid role based in San Jose, CA (3 days in office), reporting to the Director, Site Reliability Engineer. You will be a key driver in provisioning and deploying new infrastructure, focusing heavily on infrastructure automation. Your expertise will help manage how customer traffic is routed within the cloud and ensure seamless troubleshooting across hardware and automated systems.

Requirements

5+ years of relevant experience in site reliability or systems engineering
Proficiency with Python or Ansible for automation tasks as well as proficiency with interacting with external APIs.
Demonstrated experience building and maintaining automation solutions
Strong background in systems administration, specifically with Linux or other major operating systems
Bachelor’s degree in Computer Science, a related field, or equivalent practical experience

Nice To Haves

Hands-on experience with Systems Kickstart using PXE and monitoring and observability tools like Prometheus, Grafana, or Nagios.

Responsibilities

Manage and maintain large-scale distributed systems using an infrastructure-as-code approach
Develop and enhance tools to automate the deployment and management of large-scale services, focusing on reliable system architecture and maintaining high code quality
Diagnose and resolve issues by editing code, adjusting infrastructure configurations, conducting performance and network analysis, and creating reusable tools
Develop automation solutions and manage services efficiently using version-controlled infrastructure-as-code
Support mission critical services and participate in on-call rotations as needed.