Hardware Reliability Engineer

DensityAIMountain View, CA
$220,000 - $350,000

About The Position

Own the reliability of the advanced packages and systems that turn our AI accelerator silicon into products that survive years in the field. You'll define how we qualify 2.5D/3D and heterogeneously-integrated packages, model their physics of failure, drive root-cause when things fail, and build the reliability engineering that lets us predict lifetime under real workloads. You'll sit at the seam between silicon/packaging and the systems our accelerators run in, partner closely with our OSATs, and own the answer to "will this hold up in the field — and for how long?

Requirements

  • MS or Ph.D. in Materials Science, Mechanical Engineering, Electrical Engineering, Applied Physics, or related field
  • 5+ years in 2.5D/3D advanced packaging reliability
  • Deep command of physics-of-failure methodology and strong materials-science knowledge, particularly interconnects and interfaces
  • Proficiency in statistical reliability analysis (Weibull, lognormal, acceleration modeling; JMP, Minitab, or Python)
  • Hands-on failure analysis with C-SAM, X-ray CT, SEM, TEM, FIB, and EBSD
  • Proven track record driving OSAT/partner improvements and managing qualifications
  • Familiarity with JEDEC, IPC, IEEE, and MIL-STD standards

Nice To Haves

  • Heterogeneous integration, fan-out packaging, chiplet architectures, HBM, or silicon-photonics packaging
  • Electrical reliability mechanisms (electromigration, dielectric/TDDB breakdown)
  • Design-for-reliability (DFR), prognostics, and health management for electronic systems
  • AI-driven reliability modeling or machine learning for failure prediction
  • High-power / high-current package reliability for accelerators or GPUs; customer-facing qualification experience

Responsibilities

  • Conduct physics-of-failure modeling for advanced accelerator packaging; assess thermal, mechanical, and electrical stressors; define and execute stress-test protocols including thermal cycling, electromigration, HTOL, HAST/uHAST, and power cycling
  • Lead failure-mode analysis using C-SAM, X-ray CT, SEM, TEM, FIB, and EBSD; identify cracking, voiding, electromigration, and stress-induced damage; drive corrective/preventive action (8D, FMEA)
  • Build and apply models (Coffin-Manson, Arrhenius, Black's equation) and FEA-based stress simulation to predict field lifetime and FIT under real accelerator thermal and power profiles
  • Partner with assembly and test providers on reliability improvements; define requirements, ensure JEDEC/IPC/IEEE/MIL-STD compliance, monitor OSAT performance, and support supplier audits and qualifications
  • Assess thermal, mechanical, and electrical stress interactions across package, board, and the system the accelerator ships in; drive design-for-reliability into the package and board ↔ package interface with packaging, materials, SI/PI, and thermal
  • Develop design guidelines and reliability best practices, and own the reliability data presented to internal teams and customers
  • Translate package- and system-level reliability into fleet availability targets — AFR, FIT, MTBF/MTTR, and availability "nines"; drive detection and mitigation of silent data corruption (SDC) / silent data errors in production; close the loop from field telemetry, returns, and RMA back into design and qual (reliability growth); partner with data-center operations, SRE/hardware-ops, and customers on serviceability and uptime for large-scale training and inference
  • Use and develop AI-assisted / ML tool flows to accelerate failure analysis, lifetime modeling, and failure prediction

Benefits

  • Full compensation packages are based on candidate experience and relevant certifications.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service