VP of Product, Research and Training Infrastructure

CoreWeaveSunnyvale, CA
4d$233,000 - $341,000Hybrid

About The Position

As CoreWeave continues to solidify its position as the Essential Cloud for AI, we are seeking a visionary VP of Research Training Infrastructure. This executive leader will own the product strategy and engineering execution for the services that power the most ambitious AI research labs in the world. You will bridge the gap between "the metal" and the researcher, delivering a seamless, high-performance environment where frontier models are born. The Role: Architect of the AI Factory You will lead the product strategy of our Research Training Stack, focusing on the specialized orchestration, evaluation, and iteration tools required for massive-scale pre-training and post-training. This is a mission-critical role at the intersection of high-performance computing (HPC) and cloud-native agility. In 2026, CoreWeave is the foundation of the largest infrastructure buildout in human history. We are building AI Factories, not just data centers.

Requirements

  • Proven Leadership: 15+ years of experience in engineering leadership, with at least 5+ years managing large-scale infrastructure at a top-tier research lab or an AI-native cloud provider.
  • Domain Expertise: Deep, hands-on knowledge of Slurm, Kubernetes, and the specific networking requirements (InfiniBand/RDMA) for distributed training clusters.
  • Research Mindset: You likely come from a background supporting frontier model research (pre-training and post-training) and understand the "pain points" of a research scientist.
  • Scaling Experience: A track record of delivering mission-critical services on multi-thousand GPU clusters (H100/Blackwell/Rubin architectures).
  • Strategic Vision: Ability to define "what’s next" in the AI stack, from automated RL loops to specialized sandbox environments.

Responsibilities

  • Frontier Orchestration: Oversee the evolution of SUNK (Slurm on Kubernetes) to provide researchers with deterministic, bare-metal performance through a cloud-native interface.
  • Holistic Training Services: Beyond Slurm, drive the development of next-generation orchestrators and automated training-based evaluation frameworks that ensure model quality throughout the lifecycle.
  • Post-Training Excellence: Build the infrastructure required for sophisticated Reinforcement Learning (RL) and RLHF pipelines, enabling labs to refine foundation models with maximum efficiency.
  • Customer Advocacy: Act as the primary technical partner for lead researchers at global AI labs, translating their "future-state" requirements into actionable product roadmaps.

Benefits

  • Medical, dental, and vision insurance - 100% paid for by CoreWeave
  • Company-paid Life Insurance
  • Voluntary supplemental life insurance
  • Short and long-term disability insurance
  • Flexible Spending Account
  • Health Savings Account
  • Tuition Reimbursement
  • Ability to Participate in Employee Stock Purchase Program (ESPP)
  • Mental Wellness Benefits through Spring Health
  • Family-Forming support provided by Carrot
  • Paid Parental Leave
  • Flexible, full-service childcare support with Kinside
  • 401(k) with a generous employer match
  • Flexible PTO
  • Catered lunch each day in our office and data center locations
  • A casual work environment
  • A work culture focused on innovative disruption

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Executive

Education Level

No Education Listed

Number of Employees

251-500 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service