Senior Site Reliability Engineer

ZoomSan Jose, CA
1dHybrid

About The Position

At Zoom, we build and operate the global infrastructure that powers real-time communication for millions of users worldwide. Our Kubernetes Platform team designs and scales Kubernetes-based platforms that run across public cloud providers and colocation data centers. We operate at massive scale and deep in the stack — from Kubernetes control plane internals and networking, to multi-cluster orchestration, automation, and reliability engineering. If you enjoy working close to the metal and shaping modern cloud-native platforms, this role is for you. About the Team Engineering operations are the driving force behind the seamless functionality and continuous innovation that define the platform's unparalleled user experience. The team manages the infrastructure that runs ZoomPhone, Zoom Contact Center, Zoom Virtual Agent and Zoom Workforce management. By employing robust operational strategies, they guarantee the stability and security of Zoom's infrastructure, allowing millions of users worldwide to connect effortlessly.

Requirements

  • 5+ years of experience in infrastructure, platform, or systems engineering
  • Develop in Go (Golang) for building distributed systems and Kubernetes components
  • Deliver experience operating Kubernetes in cloud and/or on-prem / colo environments
  • Exhibit knowledge of cloud networking, IAM, load balancing, and service discovery
  • Develop with containers, Helm, CI/CD, and Git-based workflows
  • Use debugging skills across Linux, networking, Kubernetes, and distributed systems

Responsibilities

  • Designing, building, and operating large-scale Kubernetes platforms spanning public cloud and on- prem / colo data centers.
  • Developing Kubernetes Operators, controllers, and platform automation in Go.
  • Architect high- availability, fault-tolerant, and self-healing systems supporting real-time workloads.
  • Building and evolving internal platform services: cluster lifecycle management, provisioning, configuration, policy, and observability.
  • Own Kubernetes networking, storage, and runtime integrations across heterogeneous environments.
  • Partnering with SRE, DevOps, and Product Engineering teams to improve reliability, scalability, and developer experience.
  • Drive infrastructure-as-code and automation using Terraform, Crossplane, and GitOps workflows.
  • Contributing to capacity planning, cost optimization, security hardening, and platform governance.
  • Troubleshoot complex production issues across Kubernetes, cloud infrastructure, and data center environments

Benefits

  • As part of our award-winning workplace culture and commitment to delivering happiness, our benefits program offers a variety of perks, benefits, and options to help employees maintain their physical, mental, emotional, and financial health; support work-life balance; and contribute to their community in meaningful ways.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service