Staff Hardware Systems Engineer

CrusoeSan Francisco, CA
2d$209,000 - $253,000

About The Position

Crusoe's mission is to accelerate the abundance of energy and intelligence. We’re crafting the engine that powers a world where people can create ambitiously with AI — without sacrificing scale, speed, or sustainability. Be a part of the AI revolution with sustainable technology at Crusoe. Here, you'll drive meaningful innovation, make a tangible impact, and join a team that’s setting the pace for responsible, transformative cloud infrastructure. About This Role: We are seeking a Hardware Production / Sustaining Engineer to strengthen Crusoe’s Hardware Systems Engineering team and close critical skill gaps in debugging, validation, and production support of high-performance compute systems. In this role, you will take ownership of the full hardware lifecycle—from prototype bring-up to large-scale production—while driving automation, deep issue resolution, and reliability across Crusoe Cloud’s GPU- and CPU-based infrastructure. You will work closely with cross-functional teams to support, debug, and improve hardware platforms at scale, with a particular focus on PCIe, InfiniBand, and NVMe/storage, which have been identified as essential areas for deeper expertise. Your work will directly impact Crusoe’s ability to deploy and operate sustainable, AI-first compute systems with world-class performance and reliability.

Requirements

  • 8–10+ years of experience in hardware development, validation, sustaining engineering, or production engineering.
  • Strong hands-on expertise in PCIe, InfiniBand, and NVMe/storage debugging and development.
  • Deep proficiency in hardware bring-up, board-level debugging, and system-level validation.
  • Ability to design and implement automation frameworks for hardware testing (Python, Shell, or similar).
  • Technical background in digital and analog design, server architecture, and high-performance compute hardware.
  • Experience working across thermal, mechanical, firmware, and software functions in multidisciplinary environments.
  • Strong analytical and problem-solving skills with a data-driven approach.
  • Excellent communication and collaboration skills for working with internal teams and external partners.
  • Bachelor’s or Master’s degree in Electrical Engineering, Computer Engineering, or equivalent experience.

Nice To Haves

  • Experience designing or optimizing GPU-to-GPU communication architectures for AI/ML workloads.
  • Direct experience integrating NVLink or other next-generation GPU interconnect technologies.
  • Familiarity with cutting-edge GPU architectures and how to leverage them in AI/HPC environments.
  • Expertise supporting or designing systems across both ARM and x86 server architectures.
  • Background in sustainable or energy-efficient hardware design practices.
  • Advanced certifications or coursework in AI/HPC hardware systems.

Responsibilities

  • Drive the full hardware development and sustaining lifecycle, including feasibility, bring-up, validation, deployment, and ongoing production support.
  • Develop and maintain scripting and automation frameworks for hardware testing, diagnostics, and continuous reliability improvements.
  • Lead deep troubleshooting and debugging across: PCIe (link training, topology, performance issues) InfiniBand (fabric debugging, throughput, connectivity issues) NVMe/storage (performance bottlenecks, firmware interactions, failure analysis)
  • Conduct rigorous system validation and characterization for GPU, CPU, and high-performance compute platforms.
  • Support E2E integration and solution testing to ensure Crusoe Cloud products meet performance, reliability, and scalability expectations.
  • Collaborate with mechanical, thermal, firmware, software, and manufacturing teams to resolve system-level issues and enable stable production operation.
  • Drive prototyping, qualification, and readiness for high-volume manufacturing with both internal teams and external vendors.
  • Identify opportunities for new hardware technologies, testing methods, and sustainability improvements aligned with Crusoe’s long-term objectives.
  • Provide data-driven insights to influence Crusoe’s hardware roadmap and reliability strategy.

Benefits

  • Industry competitive pay
  • Restricted Stock Units in a fast growing, well-funded technology company
  • Health insurance package options that include HDHP and PPO, vision, and dental for you and your dependents
  • Employer contributions to HSA accounts
  • Paid Parental Leave
  • Paid life insurance, short-term and long-term disability
  • Teladoc
  • 401(k) with a 100% match up to 4% of salary
  • Generous paid time off and holiday schedule
  • Cell phone reimbursement
  • Tuition reimbursement
  • Subscription to the Calm app
  • MetLife Legal
  • Company paid commuter benefit; $300 per month
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service