Senior Technical Product Manager, GPU Orchestration

Vultr

1d•Remote

About The Position

Vultr is seeking a highly skilled and experienced Senior Technical Product Manager to own the GPU Orchestration product line. This platform powers managed Kubernetes, managed Slurm, SUNK, and Run:ai integration for GPU-based AI and HPC workloads. The ideal candidate will possess deep technical fluency in container orchestration, HPC scheduling, and distributed systems, coupled with a strong product instinct for developer and operator platforms. This is a highly visible role in a high-growth technology company, requiring close partnership with Infrastructure, Compute, Networking, and Platform teams to build a reliable, scalable, and cost-efficient orchestration platform. This is an opportunity to join a fast-growing team and make a significant impact on Vultr and the future of AI Infrastructure.

Requirements

7+ years of product management experience in cloud infrastructure, container orchestration, HPC, or developer platforms
Deep understanding of Kubernetes, Slurm, or similar orchestration and scheduling systems, including GPU scheduling, resource management, and multi-tenant isolation
Experience defining product strategy and roadmaps for platform or infrastructure products at scale
Strong technical background — ability to engage with engineering on cluster lifecycle, control plane reliability, API design, and distributed systems
Experience with AI/ML infrastructure, including training workloads, inference serving, and GPU resource optimization
Track record of shipping developer- and operator-facing products with measurable impact on reliability, adoption, or operational efficiency
Experience working across cross-functional teams (engineering, design, marketing, sales) in a fast-paced environment
Excellent written and verbal communication skills, with the ability to translate complex technical concepts for diverse audiences
Bachelor's degree in Computer Science, Engineering, or a related field (or equivalent experience)

Responsibilities

Define and execute the roadmap for managed Kubernetes, managed Slurm services, SUNK, and Run:ai integration
Own the end-to-end cluster lifecycle, including provisioning, configuration, upgrades, scaling, high availability, and decommissioning
Establish scheduling and resource management capabilities for GPU workloads, including quotas, fair-share policies, multi-tenant isolation, and priority handling
Drive integration between orchestration services and core infrastructure components, including networking, storage, identity, observability, and billing systems
Define service-level objectives for control plane reliability, job scheduling latency, cluster availability, and upgrade stability
Design APIs, CLI tooling, and UI workflows that enable self-service cluster management and workload operations
Partner with customer-facing teams to understand training, inference, and HPC use cases, translating real workload requirements into product capabilities
Monitor industry trends in container orchestration, HPC scheduling, distributed systems, and AI infrastructure to inform product direction

Benefits

100% company-paid insurance premiums for employee medical, dental and vision plans
401(k) plan that matches 100% up to 4%, with immediate vesting
Professional Development Reimbursement of $2,500 each year
11 Holidays + Paid Time Off Accrual + Rollover Plan
Increased PTO at 3 year and 10 year anniversary
1 month paid sabbatical every 5 years
Anniversary Bonus each year
$500 stipend for remote office setup in first year + $400 each following year
Internet reimbursement up to $75 per month
Gym membership reimbursement up to $50 per month
Company paid Wellable subscription