About The Position

NVIDIA DGX systems are the foundation of the world’s most advanced AI infrastructure—purpose-built servers, workstations, and personal AI computers that bring together GPUs, CPUs, NVLink, NVIDIA Networking, and a fully optimized AI software stack. This role seeks an engineering leader responsible for end-to-end delivery of every DGX compute system, from firmware through the AI stack to customer deployment. The successful candidate will ensure each DGX product ships as a production-ready system where firmware, OS, drivers, CUDA, networking, and AI applications work together seamlessly, while also driving architecture and roadmap for next-generation platforms.

Requirements

  • BS or MS in Computer Science, Electrical Engineering, or related field or equivalent experience.
  • 12+ overall years in systems firmware/software engineering, with 5+ years in engineering leadership.
  • Deep expertise in server system stack including SBIOS, BMC, OS, applications and system-level integration of complex multi-component products.
  • Proven track record delivering multi-generation server or data center platforms from architecture through customer deployment.
  • Experience managing engineering organizations across multiple geographies in a matrix environment.
  • Strong understanding of server hardware: CPU, GPU, interconnect, memory, PCIe, power delivery.
  • Experience owning end-to-end product quality—from firmware validation through full-stack system testing to field deployment.

Nice To Haves

  • Experience with NVIDIA DGX, or GPU-accelerated server platforms.
  • Track record driving server bring-up for new silicon and system architecture redesigns.
  • Familiarity with DMTF Redfish, OCP standards, and server manageability ecosystems.
  • Experience with AI/DL workload validation and performance optimization at the platform level.
  • Demonstrated ability to operate at VP/SVP level, influencing cross-BU strategic decisions.

Responsibilities

  • Ensure every DGX platform is ready for the full NVIDIA software stack—firmware, DGX OS, GPU drivers, CUDA toolkit, DCGM, DOCA/OFED, and management tools—as a validated, production-quality product.
  • Own the GA SW/FW release process delivering firmware bundles, BaseOS ISOs, and release notes to OEM/OSV partners.
  • Ensure platforms support AI agents like NemoClaw, Hermes agents, NIM microservices, and workloads customers expect out of the box.
  • Lead development of the manageability firmware stack (BMC, BIOS) for all DGX platforms.
  • Ensure firmware from partner teams (GPU, CPU, networking) integrates correctly at system level.
  • Manage 3rd-party vendors and drive platform requirements (NVPOR) across all firmware areas.
  • Define validation strategy proving each DGX platform is production-ready: end-to-end system validation including firmware regression, NVQual certification, DL workload performance, OS/CUDA stack testing, multi-user scenarios, power/thermal validation, and field upgrade reliability.
  • Establish quality gates and zero ship-stopper discipline.
  • Drive platform bring-up for each new DGX system—coordinating first boot across new silicon (CPU, GPU), board design, and firmware teams.
  • Own architectural strategy for next-generation platforms including firmware update mechanisms, system security posture, and AI application readiness.
  • Ensure firmware release flows meet CSP and enterprise deployment requirements.
  • Represent DGX platform readiness in executive reviews and strategic planning with VP/SVP leadership.
  • Engage with industry standards bodies (DMTF Redfish, OCP).
  • Own the complete DGX delivery lifecycle—system architecture, firmware development, integration, full-stack validation, GA release, and customer deployment—for every DGX product.
  • Serve as single point of accountability for DGX platform readiness across NVIDIA—aligning GPU, CPU, networking, security, OS, and AI software teams to deliver on schedule.
  • Own RCCA processes for field issues.
  • Manage external vendor partnerships (AMI for SBIOS, BMC contributors) with clear quality gates and program tracking.
  • Build and lead a world-class engineering organization.
  • Mentor and develop leaders.
  • Foster a culture of technical excellence, intellectual honesty, and customer obsession.

Benefits

  • equity
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service