About The Position

CoreWeave is seeking a Staff Technical Program Manager to lead complex, cross-functional programs across Cluster Orchestration and Applied Training within our AI/ML Platform Services organization. Cluster Orchestration is the platform layer that makes sure large AI workloads are scheduled, launched, and managed reliably across CoreWeave’s clusters. Applied Training is the layer on top of that infrastructure that helps researchers and customers use it for pre-training, fine-tuning, reinforcement learning, evaluations, and sandboxed experimentation. In this role, you will partner with engineering, product, infrastructure, and research-adjacent teams to improve both how workloads run on the cluster and how users interact with the training platform built on top of it. That includes driving programs across orchestration systems such as Slurm-on-Kubernetes (SUNK), Kueue, and workflow integrations, while also helping scale the environments, tooling, and operational mechanisms that make training and evaluation workflows easier to use. This is a highly cross-functional role for a TPM who combines strong technical depth, excellent execution instincts, and the ability to bring structure and clarity to fast-moving infrastructure and AI platform initiatives.

Requirements

  • Bachelor’s degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience.
  • 8+ years of technical program management experience in cloud infrastructure, distributed systems, or AI/ML platforms.
  • Experience leading large-scale cross-functional programs involving scheduling systems, cluster infrastructure, or ML platform capabilities.
  • Strong technical fluency in Kubernetes, Slurm or comparable schedulers, distributed systems, and AI training workflows.
  • Demonstrated ability to define program metrics and deliver measurable outcomes in performance, reliability, scale, or operational maturity.
  • Excellent communication skills, with experience influencing engineering, product, and executive stakeholders.

Nice To Haves

  • Experience with orchestration and scheduling technologies such as Kubernetes, Slurm, Kueue, Ray, or similar systems.
  • Familiarity with modern AI training and evaluation workflows, including pre-training, supervised fine-tuning, reinforcement learning, and experiment or sandbox environments.
  • Understanding of GPU infrastructure, cluster capacity planning, multi-tenant execution, and distributed training tradeoffs.
  • Experience building launch processes, release governance, dependency management, and operational review mechanisms in fast-scaling environments.
  • Familiarity with AI developer and research tooling such as W&B, SkyPilot, or adjacent ecosystem platforms.

Responsibilities

  • Drive end-to-end program execution for cluster orchestration initiatives spanning workload scheduling, self-service provisioning, upgrade and migration flows, and platform integrations.
  • Lead cross-functional programs that improve how AI training, evaluation, RL, and mixed workloads run across CoreWeave clusters.
  • Partner with engineering and product leaders to define roadmap priorities and deliver measurable improvements in utilization, reliability, scalability, observability, and user experience.
  • Drive delivery for applied training initiatives across pre-training, fine-tuning, reinforcement learning, sandbox environments, and evaluation systems.
  • Coordinate dependencies across platform engineering, infrastructure, product, customer-facing teams, and ecosystem partners to ensure successful launches and clear operational ownership.
  • Build program mechanisms for release readiness, rollout planning, risk management, stakeholder communication, and post-launch review.
  • Establish success metrics, dashboards, and operating cadences to improve cluster efficiency, workload startup performance, time-to-research, and adoption of new platform capabilities.
  • Create clarity across ambiguous technical programs by aligning stakeholders, surfacing tradeoffs early, and driving decisions to resolution.

Benefits

  • Medical, dental, and vision insurance - 100% paid for by CoreWeave
  • Company-paid Life Insurance
  • Voluntary supplemental life insurance
  • Short and long-term disability insurance
  • Flexible Spending Account
  • Health Savings Account
  • Tuition Reimbursement
  • Ability to Participate in Employee Stock Purchase Program (ESPP)
  • Mental Wellness Benefits through Spring Health
  • Family-Forming support provided by Carrot
  • Paid Parental Leave
  • Flexible, full-service childcare support with Kinside
  • 401(k) with a generous employer match
  • Flexible PTO
  • Catered lunch each day in our office and data center locations
  • A casual work environment
  • A work culture focused on innovative disruption
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service