Staff Site Reliability Engineer

ZscalerSan Jose, CA
16hHybrid

About The Position

Zscaler is a pioneer and global leader in zero trust security. The world’s largest businesses, critical infrastructure organizations, and government agencies rely on Zscaler to secure users, branches, applications, data & devices, and to accelerate digital transformation initiatives. Distributed across more than 160 data centers globally, the Zscaler Zero Trust Exchange platform combined with advanced AI combats billions of cyber threats and policy violations every day and unlocks productivity gains for modern enterprises by reducing costs and complexity. Here, impact in your role matters more than title and trust is built on results. We believe in transparency and value constructive, honest debate—we’re focused on getting to the best ideas, faster. We build high-performing teams that can make an impact quickly and with high quality. To do this, we are building a culture of execution centered on customer obsession, collaboration, ownership and accountability. We champion an “AI Forward, People First” philosophy to help us accelerate and innovate, empowering our people to embrace their potential. If you’re driven by purpose, thrive on solving complex challenges and want to make a positive difference on a global scale, we invite you to bring your talents to Zscaler to help shape the future of cybersecurity. We are looking for a Staff Site Reliability Engineer to join our team. This role will report to the Senior Manager, Site Reliability Engineering and offers the flexibility of hybrid (3 days a week) out of San Jose, CA, or can be performed fully remote. As a key member of the Zero Trust Exchange team, you will be responsible for all aspects of the Zscaler production data center services, including servers, operating systems, storage, and supporting systems. You will be an instrumental part of the Cloud Operations team, ensuring the availability, latency, performance, efficiency, and scalability of a cloud that processes tens of billions of transactions daily.

Requirements

  • US Citizenship is required (due to the nature of assigned customers) and 5+ years of industry experience in a 24/7 NOC or Cloud Operations environment
  • Proficiency with programming languages such as Python or Bash
  • Deep understanding of networking standard protocols including HTTP, DNS, TCP/IP, ICMP, and the OSI Model
  • Hands-on experience with monitoring tools (e.g. Nagios, Grafana, Prometheus, etc.) and networking principles like Firewalls and Load Balancing
  • Ability and flexibility to work after hours or weekends for application releases and deployments in a fast-paced environment

Nice To Haves

  • Experience with programming languages like Go
  • Experience with incident management and being able to drive resolution
  • Bachelor’s or Master’s degree in computer science or relevant field (or equivalent experience)

Responsibilities

  • Design, code, and deploy software solutions and automation while looking for opportunities to optimize the existing code-base for maintainability and reusability
  • Create and deploy scalable monitoring systems and end-to-end solutions for a massively growing global infrastructure in collaboration with Software Engineering and Development teams
  • Monitor applications and services within the environments, participate in on-call rotation, and implement strategies to prevent future occurrences of issues
  • Resolve escalated issues and prevent recurring operational overhead by documenting and automating processes while deploying patches, upgrades, and administrative tools
  • Collaborate with cross-functional teams to recommend integration strategies for platforms and applications to constantly improve and identify opportunities for process improvement

Benefits

  • Various health plans
  • Time off plans for vacation and sick time
  • Parental leave options
  • Retirement options
  • Education reimbursement
  • In-office perks, and more!
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service