Senior Site Reliability Engineer

ZoomBengaluru, IN
15hRemote

About The Position

We are looking for a Senior Site Reliability Engineer (SRE) to support our Kubernetes platforms and customer-facing data systems. In this role, you will improve system reliability, scalability, and day-to-day operations across our distributed infrastructure and data platforms. You will partner with Infrastructure, Data Platform, and Application Engineering teams to reduce operational workload, improve incident response, and drive automation across multi-region environments. This Senior SRE will sit in the Infrastructure and Data Systems organization. It is responsible for building, operating, and scaling Zoom’s core data platforms and cloud infrastructure that power critical business and customer-facing workloads globally. The team works at the intersection of reliability, scalability, and developer productivity, enabling engineering teams to move fast while maintaining security and operational excellence.

Requirements

  • Have 6+ years of experience in SRE, Platform Engineering, or Infrastructure roles.
  • Show hands-on experience with Kubernetes (K8s) in production environments.
  • Have experience in Linux systems, networking fundamentals, and distributed systems.
  • Show experience with monitoring and observability tools (Prometheus, Grafana, Datadog, PagerDuty, etc.).
  • Demonstrate effective programming/scripting skills in Python, Go, or Shell.
  • Be able to build and operate CI/CD pipelines (GitHub Actions, Jenkins, ArgoCD, etc.) and support data platforms (Spark, Trino/Presto, Airflow, Kafka) in production.
  • Have hands-on experience with cloud platforms (AWS, GCP, Azure) and incident management, troubleshooting and RCA skills

Nice To Haves

  • Bring experience in data platform reliability and automation and AI-assisted operations (a bonus).
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service