Senior Technical Program Manager

CrusoeSunnyvale, CA
$161,700 - $196,000

About The Position

Crusoe's Cloud Product team is hiring a Senior Technical Program Manager to own and drive technical programs across our AI IaaS platform, aligning Product and Engineering on delivery. This role requires a hands-on individual contributor with genuine technical depth in AI infrastructure, comfortable operating in ambiguous, fast-moving environments and capable of building execution structure where little exists. Our vision is the easiest-to-use AI purpose-built cloud. We offer IaaS products, letting AI/ML engineers focus on AI model frameworks (PyTorch, Ray) and compute stacks (CUDA, ROCm) while Crusoe manages the underlying complexity. Our platform standardizes firmware/OS bundles and automates component orchestration for consistent, scalable infrastructure. We also offer AI Managed Services, like SLA-bounded Managed Inference. The TPM connects engineering, product, procurement, and data center operations to deliver a reliable platform where customers run AI workloads without managing low-level system details. At this level, TPM engagement focuses on defined technical programs and workstreams within larger cross-functional initiatives: GPU cluster commissioning, feature delivery within IaaS products, NPI workstream ownership, and coordination across hardware and software dependencies. The ideal candidate is engineer-rooted with hands-on technical depth in compute infrastructure, firmware, or hardware platform delivery. You will own defined programs end-to-end, build lightweight execution frameworks, govern dependencies within your scope, and grow into broader program ownership over time.

Requirements

  • AI Infrastructure Technical Depth: Genuine hands-on knowledge of the AI infrastructure stack — GPU firmware, drivers, BIOS/BMC, CUDA or ROCm stacks, OS configurations, and the tooling required to bring a cluster to production. Ability to engage credibly with firmware, hardware, networking, and compute engineers without a translator.
  • Commissioning Fluency: Familiarity with GPU cluster commissioning — understanding of the distinct workstreams (Compute, SDN, ZTP, Firmware, Network, Storage), commissioning gate definitions, and what it means to take ownership of a cluster from RFN through GA.
  • Experience at Hyperscalers or Neoclouds: 5–10 years of experience as a Technical Program Manager, with meaningful tenure at a hyperscaler (AWS, Azure, Google, Meta, OCI) or neocloud (CoreWeave, Lambda Labs) in a hardware, firmware, compute platform, or IaaS delivery role. Must have been on the build side — not customer-facing or enterprise IT delivery.
  • AI Tool Integration: Active daily use of AI tools to enhance program execution, tracking, and communication.
  • Execution Rigor: Demonstrated ability to establish program predictability in ambiguous environments. Builds lightweight structure that engineering teams trust and adopt.
  • Communication: Clear written and verbal communication for delivering program status, risks, and dependencies to engineering leads and product managers.

Nice To Haves

  • Bachelor's or Master's degree in Engineering, Computer Science, or a related technical field
  • Hands-on engineering background — hardware engineering, systems engineering, silicon validation, embedded software, or firmware
  • Experience with GPU NPI programs, firmware deployment across compute fleets, or datacenter commissioning
  • Experience at a hyper-growth company

Responsibilities

  • Workstream & Program Ownership: Own delivery of defined feature sets or commissioning workstreams within larger IaaS and NPI programs. Drive execution from kickoff through GA or handoff milestone.
  • Hands-On Execution: Personally manage technically complex programs within your scope. Identify blockers early, intervene to correct course, and escalate cross-organizational issues promptly when they exceed your authority.
  • Commissioning Leadership: Own the Cloud TPM side of GPU cluster commissioning for assigned deployments — define RFN acceptance criteria, commissioning readiness requirements, and Go/No-Go criteria. Lead Phase 4.0 commissioning activities including network fabric, storage provisioning, GPU validation, burn-in, and monitoring integration.
  • Risk & Dependency Governance: Proactively identify technical and organizational risks within your programs. Maintain dependency logs, surface blockers before scheduled reviews, and drive issues to resolution.
  • Execution Frameworks: Build lightweight, fit-for-purpose execution structures for your programs — standups, dashboards, status updates, and risk logs. Establish predictability within your workstream.
  • External Partner Coordination: Coordinate with external dependencies such as hardware partners, ODMs, or data center operations teams on assigned programs. Support certification, validation, and infrastructure stack alignment under guidance of senior TPMs.

Benefits

  • Competitive compensation
  • Restricted Stock Units
  • Paid time off & paid holidays
  • Comprehensive health, dental & vision insurance
  • Employer contributions to HSA account
  • Paid parental leave
  • Paid life insurance, short-term and long-term disability
  • Professional development & tuition reimbursement
  • Mental health & wellness support
  • Commuter benefits (parking & transit)
  • Cell phone stipend
  • 401(k) Retirement plan with company match up to 4% of salary
  • Volunteer time off
  • Competitive compensation and equity packages
  • Comprehensive health & wellness programs
  • Paid time off, paid holidays & leave of absence programs
  • Retirement & pension plans
  • Life & income protection benefits
  • Learning & development
  • Mental health & wellbeing resources
  • Global travel insurance & emergency assistance
  • Daily meals allowance
  • Volunteer time off
  • Additional perks & programs specific to location
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service