Site Reliability Engineer

ZoomSan Jose, CA
11hHybrid

About The Position

Part of the Network Automation team, this role supports delivering systems and tools to ensure performance, availability, and security for Zoom's data center networks. We seek an engineer to join the Network Automation team, focusing on creating and maintaining tools for high-performance networks across Zoom’s global data centers. These tools are essential for infrastructure, requiring collaboration with various teams to ensure effective data center management. The ideal candidate is motivated, goal-driven, and team-oriented, with experience in networking, coding, and system infrastructure. Responsibilities include developing new tools and overseeing current infrastructure to enhance operational efficiency and reliability.

Requirements

  • 2-5 years experience in development with programming languages such as Python and Go.
  • Have experience in CI/CD pipeline development and management, e.g. Jenkins, GitLab runners, Azure DevOps.
  • 2-5 years of experience with Linux administration and scripting.
  • Have experience with network monitoring tools or suites such as Grafana, Prometheus, Solarwinds, Zabbix, Nagios, etc…

Responsibilities

  • Develop new and maintain existing tooling, APIs, and CI/CD pipelines that help manage Zoom's data center networks.
  • Work extensively in infrastructure as code frameworks to manage network and monitoring configurations.
  • Partner with network and data center engineers to implement new features and changes to our systems.
  • Develop and publish metrics and dashboards showing the health of our network and the automation tools.
  • Identify and troubleshoot system and tooling failures to ensure reliable performance of Zoom's networks.
  • Participate in an on-call rotation.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service