About The Position

NVIDIA’s NPI Operations Team is looking for a highly motivated System Product Development Engineer to lead the development and productization of DGX products through mass production ramp. This role focuses on DGX systems and L11 rack-scale AI supercomputer reference platforms—some of the most advanced computing systems in the world for Artificial Intelligence. These products form the foundation for training and deploying industry-leading LLMs and represent NVIDIA’s fastest-growing business segment and largest market opportunity.

Requirements

  • Expertise in server platform architecture, CPU/GPU baseboards, and high-speed interfaces with proven system-level debug skills and exceptional diagnostic instincts.
  • Strong knowledge of BMC, firmware architecture, and manufacturing diagnostics.
  • Familiarity with L11 integration processes.
  • Excellent communication skills to articulate problems and deliver clear recommendations.
  • Strong analytical skills to synthesize complex information and provide actionable guidance.
  • Leadership skills to manage factory operations and drive issue resolution.
  • Collaborative mindset to work seamlessly with cross-functional teams and external partners.
  • Results-driven approach to achieving optimal outcomes across all aspects of NPI operations.
  • 12+ years in system engineering, debug, or equivalent relevant experience
  • BS or higher in Electrical Engineering, Computer Engineering or equivalent experience

Responsibilities

  • Drive development and productization of NVIDIA’s DGX datacenter products and L11 systems.
  • Lead debug efforts for L11 rack-level integration, creating and applying tools/scripts for failure identification and root cause analysis.
  • Provide clear, actionable guidance to factories to resolve issues quickly and implement corrective actions that improve manufacturing quality and efficiency.
  • Develop and document robust, stable recipes—including diagnostics, firmware, and software—for mass production ramp.
  • Review and provide feedback on test plans and factory acceptance criteria, focusing on yield, quality, and efficiency.
  • Collaborate with development, validation, and manufacturing test engineering teams to understand diagnostic and firmware release plans and their impact on system stability and quality.
  • Influence test engineering and diagnostic teams to enhance test methodology, telemetry, and debug capabilities for precise FRU identification and isolation of firmware/test issues and to improve test coverage with the goal of preventing downstream escapes.
  • Partner with product development teams to ensure designs are optimized for manufacturing.
  • Present NPI status updates and critical issues to executive management.

Benefits

  • competitive salaries
  • generous benefits package
  • equity
  • benefits
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service