Join our Platform Engineering team to design, build, and operate large-scale, on-prem Kubernetes infrastructure powering next-generation AI/ML platforms, including GPU-enabled environments for both traditional ML and state-of-the-art LLM workloads. You will be pivotal in defining and evolving a highly scalable Kubernetes platform that serves as the foundation for AI/ML workloads. This role combines deep Kubernetes platform engineering with AI/ML infrastructure enablement, ensuring performance, reliability, and scalability across distributed systems. You will lead technical direction across Kubernetes control plane operations, cluster lifecycle management, and platform extensibility, while working closely with data scientists, ML engineers, and infrastructure teams to support production AI workloads at scale. This is a senior individual contributor role focused on platform ownership, engineering excellence, and driving reliability and automation across complex distributed environments.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed