Senior Director, System Software Engineering - DGX Cloud

NVIDIASanta Clara, CA
$384,000 - $575,000

About The Position

NVIDIA is seeking a Senior Director, System Software Engineering, to lead strategy and execution for capacity management in DGX Cloud, building the capacity foundation for NVIDIA's internal AI research clusters. This leader will shape the roadmap for scalable system software that automates GPU management at scale, drive execution across teams and functions, and partner closely with architecture, security, product, and developer platform leaders to deliver reliable, high-performance software that powers the next generation of accelerated computing. The ideal candidate combines deep systems expertise with strong organizational leadership, technical judgment, and builds teams that deliver sophisticated platform software at scale.

Requirements

  • BS, MS, or PhD in Computer Science, Computer Engineering, or a related technical field, or equivalent experience.
  • 16+ overall years of relevant management experience in system software, platform software, or distributed systems engineering, 7+ years of significant leadership experience leading engineering organizations.
  • Deep technical expertise in operating systems, distributed systems, platform architecture, cloud infrastructure, or large-scale systems software.
  • Demonstrated experience leading delivery of complex software platforms spanning reliability, performance, scalability, security, and observability.
  • Strong record of leadership and influence across engineering, product, program management, and executives.
  • Demonstrated success building and leading high-performing teams, developing leaders, and scaling organizations through growth and change.
  • Excellent technical communication and decision-making, with the ability to connect architecture choices to business outcomes.
  • Demonstrated experience with industry-leading AI tools that help engineers and engineering leaders work more efficiently.

Nice To Haves

  • Experience with AI infrastructure, accelerated computing, GPU-optimized software stacks, or large-scale training and inference environments.
  • Experience leading platform software for cloud-native or hybrid-cloud deployments.
  • Track record of driving architectural simplification and operational excellence across large, complex engineering portfolios.
  • Experience partnering with open-source communities and ecosystem partners on platform adoption and enablement.

Responsibilities

  • Define and drive the system software strategy for capacity management and automation for DGX Cloud's GPU cloud platforms, aligning long-range technical direction with business and product priorities.
  • Lead engineering leaders responsible for core platform capabilities such as runtime software, host and cluster management, provisioning, observability, reliability, security, and performance optimization.
  • Build a strong execution model across planning, architecture reviews, release readiness, quality, and operational excellence for software delivered across on-prem and cloud environments.
  • Partner closely with security, DevOps, research, and product organizations to translate platform requirements into scalable software roadmaps and high-quality releases.
  • Establish measurable goals for engineering efficiency, service reliability, software quality, and customer impact, using data to continuously improve delivery and operations.
  • Attract, develop, and retain world-class engineering leaders while fostering technical excellence, accountability, inclusion, and innovation.

Benefits

  • highly competitive salaries
  • comprehensive benefits package
  • equity
  • benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service