Senior Product Quality Engineer

NVIDIASanta Clara, CA
$116,000 - $235,750

About The Position

NVIDIA is seeking a Product Quality Engineer to join our Systems Product Quality team. This role serves as the system-level power debug domain expert for customer returns and field failures. The position will lead technical failure analysis for NVIDIA data center systems, compute trays, and compute modules, with a specific focus on customer-reported field issues related to power delivery, power sequencing, and intermittent power events. The ideal candidate will possess strong hands-on debug capabilities, a structured root-cause mindset, and the ability to link board-level signals to system-level behavior to drive customer-return investigations from symptom confirmation through root cause analysis and corrective action closure. The role involves owning customer-return and field-failure power investigations from symptom confirmation through containment, root cause, corrective action, and quality learning closure.

Requirements

  • Bachelor's degree or equivalent experience in Electrical Engineering, Electronic Engineering, or a related field; Master's degree preferred.
  • 5+ years of hands-on experience in hardware debug, customer return analysis, field failure analysis, or power electronics support for complex electronic systems.
  • Strong understanding of system power delivery, DC-DC converters, multiphase VRs, regulators, power sequencing, current sharing, sense circuits, protection circuits, and high-current low-voltage rails.
  • Proven ability to debug power issues at system, board, and component level by reading schematics, PCB layouts, power trees, design specifications, and test logs.
  • Experience with Linux systems, Linux shell scripts, BMC/IPMI/Redfish-style logs or telemetry, and basic automation for data collection and debug efficiency.
  • Strong analytical and problem-solving skills, including structured troubleshooting, design of experiments, root cause analysis, statistical process control, and quality data analysis.
  • Ability to work across engineering, customer quality, supplier, and customer-facing teams while maintaining clear technical ownership and urgency.
  • Excellent written and spoken English, strong documentation habits, and the ability to explain complex debug findings to both technical and non-technical audiences.
  • High sense of responsibility, self-motivation, collaborative working style, and comfort driving ambiguous technical issues to closure.

Nice To Haves

  • Experience debugging high-power server or data center platforms in customer-return or field-failure analysis workflows.
  • Hands-on familiarity with PSU/PDU behavior, rack-level power distribution, power capping, power transients, or data center deployment conditions observed in field returns.
  • Experience with board-level power design, hardware verification, power integrity measurement, or design-for-debug improvements.
  • Knowledge of quality and reliability concepts, 8D problem solving, customer failure reporting, RMA/FA workflow, and supplier corrective action processes.
  • Ability to confirm, bound, and translate power-related field failures into corrective actions, debug playbooks, and prevention feedback for design, customer quality, and supplier teams.

Responsibilities

  • Lead system-level power failure analysis for customer returns and field failures across data center systems, compute trays, and compute modules.
  • Confirm, reproduce, and isolate complex power failures such as no power, intermittent boot, unexpected shutdown, brown-out, rail droop, over-current protection, under-voltage protection, sequencing faults, hot-plug events, and margin-related failures.
  • Analyze system power architecture from AC/DC input through PSU, PDU, hot-swap, eFuse, VR, regulator, current-sense, and board-level power rails to determine the true failure boundary.
  • Use oscilloscopes, current probes, DMMs, BMC-reported voltage/current readings, system event logs, and Linux-based diagnostics to build fact-based debug conclusions.
  • Correlate field return data, customer logs, firmware behavior, board schematics, PCB layout, BOM history, and telemetry trends to identify root cause and assess risk.
  • Partner with hardware design, power design, firmware, customer quality, reliability, manufacturing, and supplier quality teams to resolve critical customer and field issues.
  • Drive containment, failure analysis, corrective and preventive actions, and defect-prevention feedback with clear ownership and closure criteria.
  • Create concise technical reports, quality updates, and executive-ready summaries that communicate failure mechanism, impact, risk, mitigation, and next steps.

Benefits

  • Highly competitive salaries
  • Comprehensive benefits package
  • Equity
  • Benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service