Kubernetes Platform Engineer – Big Data Infrastructure

Zoom•San Jose, CA

1d•Hybrid

About The Position

You can expect to contribute to Zoom’s Big Data Infrastructure platform, designing and operating open‑source compute engines on Kubernetes. You will help build reliable and scalable systems that power analytics, machine learning, telemetry, and product insights across Zoom. The Big Data Infrastructure team is responsible for running open‑source data engines—such as Spark, Flink, and Trino—on Kubernetes. The team owns engine runtimes, automation, observability, multi‑tenant operations, and data lake integrations that support Zoom’s global analytics needs.

Requirements

Have experience running workloads on Kubernetes in production
Possess hands‑on expertise with Spark, Flink, or Trino
Build infrastructure through Terraform, Helm, and GitOps practices
Operate cloud environments (ideally AWS EKS)
Demonstrate understanding of distributed systems performance and architecture
Show solid debugging and root‑cause analysis skills
Drive improvements in platform reliability and scalability
Collaborate effectively across cross‑functional teams
Learn new open‑source engines and tools quickly
Communicate clearly in technical discussions and design reviews

Responsibilities

Designing Kubernetes infrastructure to run distributed compute engines
Building automation and IaC modules using Terraform and Helm
Implementing multi‑tenant resource isolation, RBAC, and secure access patterns
Operating engine runtimes and control plane components on EKS
Integrating data ingestion systems and object‑storage data lakes
Managing table operations such as compaction, retention, and partition management
Monitoring engine and cluster performance using modern observability tools
Debugging distributed jobs across Spark, Flink, and Trino runtimes
Automating CI/CD workflows and engine upgrade processes
Collaborating with data, ML, and SRE teams on platform improvements