Hardware Engineer

STN IncSan Francisco, CA
Hybrid

About The Position

The Hardware Engineer owns hardware lifecycle for GPU and supporting infrastructure assets, including fleet health monitoring, RMA workflows, firmware management, and long-range capacity planning. The role is the technical owner of the physical compute platform.

Requirements

  • 5+ years in hardware engineering, systems engineering, or data center engineering
  • Deep knowledge of x86 server architecture, GPU systems, and modern storage
  • Hands-on experience with NVIDIA HGX, DGX, or hyperscale-class systems
  • Strong Linux fundamentals and scripting skills (Python, Bash)
  • Bachelor's degree in computer science, electrical engineering, or related field

Nice To Haves

  • Experience with NVIDIA Mission Control, Base Command Manager, or Bright Cluster Manager
  • Familiarity with IPMI, Redfish, and vendor management interfaces
  • Knowledge of liquid cooling and high-density power architectures
  • Experience operating fleets of 1,000+ GPUs

Responsibilities

  • Monitor GPU and server health including thermal, error rates, and component failures
  • Drive the RMA process with vendors (NVIDIA, Supermicro, HPE, and others) end-to-end
  • Manage firmware, BIOS, and BMC upgrade campaigns across the fleet
  • Develop hardware burn-in and acceptance test procedures, including NCCL and stress tests
  • Investigate hardware failures and produce vendor-grade root cause analyses
  • Maintain hardware inventory, asset records, and CMDB accuracy
  • Drive capacity planning across compute, storage, and networking
  • Coordinate with Procurement on spare parts strategy and stocking levels
  • Author hardware engineering runbooks and operational procedures
  • Support new platform bring-up, qualification, and reference architecture validation
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service