Staff Site Reliability Engineer

ZscalerSan Jose, CA
$119,000 - $170,000Hybrid

About The Position

Zscaler accelerates digital transformation to ensure our customers can be more agile, efficient, resilient, and secure. As an AI-forward enterprise, we are constantly pushing the envelope, leveraging the world’s largest security data lake to power our cloud-native Zero Trust Exchange platform. This innovation protects our customers from cyberattacks and data loss by securely connecting users, devices, and applications in any location. Here, impact in your role matters more than title and trust is built on results. We say, impact over activity. We seek innovators who actively use AI to amplify their impact and who thrive in an environment where we leverage intelligent systems to stay ahead of evolving threats. We believe in transparency and value constructive, honest debate—we’re focused on getting to the best ideas, faster. We build high-performing teams that can make an impact quickly and with high quality. To do this, we are building a culture of execution centered on customer obsession, collaboration, ownership, and accountability. We value high-impact, high-accountability with a sense of urgency where you’re enabled to do your best work and embrace your potential. If you’re driven by purpose, thrive on solving complex challenges, and want to be part of the team that’s helping to secure the AI age, we invite you to bring your talents to Zscaler and help shape the future of cybersecurity. Role We are looking for a Staff Site Reliability Engineer (Automation) to join our Engineering team. This is a hybrid role based in San Jose, CA (3 days in office), reporting to the Director, Site Reliability Engineer. You will be a key driver in provisioning and deploying new infrastructure, focusing heavily on infrastructure automation. Your expertise will help manage how customer traffic is routed within the cloud and ensure seamless troubleshooting across hardware and automated systems.

Requirements

  • 5+ years of relevant experience in site reliability or systems engineering
  • Proficiency with Python or Ansible for automation tasks as well as proficiency with interacting with external APIs.
  • Demonstrated experience building and maintaining automation solutions
  • Strong background in systems administration, specifically with Linux or other major operating systems
  • Bachelor’s degree in Computer Science, a related field, or equivalent practical experience

Nice To Haves

  • Hands-on experience with Systems Kickstart using PXE and monitoring and observability tools like Prometheus, Grafana, or Nagios.

Responsibilities

  • Manage and maintain large-scale distributed systems using an infrastructure-as-code approach
  • Develop and enhance tools to automate the deployment and management of large-scale services, focusing on reliable system architecture and maintaining high code quality
  • Diagnose and resolve issues by editing code, adjusting infrastructure configurations, conducting performance and network analysis, and creating reusable tools
  • Develop automation solutions and manage services efficiently using version-controlled infrastructure-as-code
  • Support mission critical services and participate in on-call rotations as needed.

Benefits

  • Various health plans
  • Time off plans for vacation and sick time
  • Parental leave options
  • Retirement options
  • Education reimbursement
  • In-office perks, and more!
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service