Failure Analysis Engineering Manager, GPU ASIC and PCBA Debug

Advanced Micro Devices, IncSecaucus, NJ
Onsite

About The Position

The Quality Engineering team is looking for an experienced GPU ASIC and PCBA Debug and Failure Analysis Engineering Manager to lead and develop a team of FA engineers. This role is intended for a proven people manager with prior experience building, mentoring, and guiding high-performing engineering teams, while also serving as a strong technical lead in GPU ASIC and board-level (PCBA) failure analysis. The individual will oversee customer and factory failure investigations for GPU accelerators, help drive failure reproduction and isolation, and work closely with cross-functional teams including design, validation, FW, and manufacturing to accelerate root cause analysis and corrective actions. Your contributions will directly impact team effectiveness, product quality, reliability, and customer satisfaction.

Requirements

  • Bachelor’s degree in Electrical Engineering, Computer Engineering, or a related field.
  • 3+ years of experience management experience

Nice To Haves

  • Experience leading and developing engineering teams, with a strong track record of hiring, coaching, mentoring, and growing FA engineers.
  • Deep expertise in GPU ASIC debug, validation, and functional or stress test development.
  • Strong background in PCBA diagnostics, failure analysis, and board-level debug from NPI through production.
  • Experience leading triage across power, ASIC, firmware, and thermal failure domains.
  • Strong hands-on lab experience with oscilloscopes, logic analyzers, and custom debug tools.
  • Solid understanding of firmware, drivers, and hardware interactions in complex system debug.
  • Extensive experience in hardware verification, system integration, and failure reproduction.
  • Proficient in Python, shell scripting, and working across Windows and Linux environments.
  • Strong leadership, communication, and presentation skills, with the ability to teach, mentor, and lead by example.
  • Able to read schematics, interpret datasheets, identify components, and support board-level debug and rework.
  • Knowledge of high-speed digital design, HBM or GDDR memory, PCIe, and GPU data center systems is a plus.

Responsibilities

  • Provide technical leadership for triage and debug of complex GPU and PCBA failures across power, ASIC, firmware, and thermals, guiding the FA team to root cause.
  • Lead failure reproduction and triage by defining debug plans, directing investigations, and guiding experiments and escalation paths for complex issues.
  • Drive debug automation, diagnostic tools, and data analysis methods that improve triage efficiency and consistency across failure domains.
  • Lead cross-functional triage with manufacturing partners and AMD teams to align on failure hypotheses, reproduction, and root cause.
  • Guide board-level debug using schematics, layouts, and design documentation to direct analysis and mentor engineers through the process.
  • Ensure clear documentation of failure analysis results, root cause findings, and corrective actions for customer and internal use.
  • Present technical findings, triage updates, risks, and recovery plans to stakeholders and senior leadership.
  • Drive continuous improvement of FA methods, triage processes, and best practices across power, ASIC, firmware, and thermal debug.
  • Manage and develop a team of FA engineers by setting priorities, providing technical guidance, and coaching through complex investigations.

Benefits

  • AMD benefits at a glance
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service