Senior Engineering Manager - Next-generation Kubernetes platform

NutanixSan Jose, CA
$195,200 - $391,200Hybrid

About The Position

We are looking for a Senior Engineering Manager to lead the design, development, and scaling of a next-generation Kubernetes platform powering enterprise environments. This platform will serve as the foundation for AI/ML workloads, GPU infrastructure, and enterprise applications, delivering hyperscaler-like capabilities in on-prem and hybrid deployments. You will lead a team responsible for building a production-grade, globally scalable Kubernetes platform, including cluster lifecycle, fleet management, multi-tenancy, and deep integration with compute (CPU/GPU), networking, and storage systems.

Requirements

  • Proven experience leading and scaling high-performing engineering teams
  • Ability to drive clarity, ownership, and execution in complex, ambiguous problem spaces
  • Strong understanding of distributed systems at scale
  • Hands-on familiarity with cloud platforms, infrastructure systems, or PaaS offerings
  • Experience building large, meaningful production systems (cloud platforms, infrastructure, or PaaS)
  • Platform & Systems Thinking: Experience designing multi-tenant platforms with clear abstractions (projects, quotas, policies)
  • Familiarity with multi-cluster / fleet management and large-scale system design
  • Ability to balance long-term architecture with near-term delivery
  • Track record of delivering reliable, production-grade systems
  • Experience with SLOs, observability, incident management, and lifecycle operations
  • Strong ability to work across product, hardware, and field teams
  • Effective executive-level communication and stakeholder management

Nice To Haves

  • Kubernetes experience is desirable, but not required—we welcome leaders who are excited to learn Kubernetes deeply and apply strong systems fundamentals to this domain
  • Exposure to AI/ML workloads or GPU-based systems is a plus
  • Strong platform engineers who are excited to grow into AI infrastructure—this role offers the opportunity to learn and build in the rapidly evolving space of GPU scheduling, training, and inference systems

Responsibilities

  • Own end-to-end delivery of key platform capabilities, including cluster lifecycle, fleet management, and multi-tenancy
  • Drive the design of large-scale distributed systems, evolving toward global control planes and cell-based architectures
  • Lead a team of engineers to build AI-native infrastructure, including GPU-aware scheduling, resource isolation, and workload orchestration
  • Partner closely with Product and cross-functional teams to translate enterprise and AI use cases into platform capabilities
  • Establish a strong operational excellence culture, including SLOs, reliability engineering, and production readiness
  • Simplify complex infrastructure into intuitive, consumable platform experiences for enterprise users

Benefits

  • sign-on bonus
  • restricted stock units
  • discretionary awards
  • full range of medical, financial, and/or other benefits
  • 401(k) eligibility
  • various paid time off benefits, such as vacation, sick time, and parental leave
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service