Senior Manager, DGX Cloud Technical Program Management

NVIDIASanta Clara, CA
Hybrid

About The Position

For over 25 years, NVIDIA has led the world in visual computing and accelerated computing. Today, we’re crafting the future of AI by driving breakthroughs in generative models, autonomous systems, and large-scale research. The DGX Cloud organization builds and operates the AI infrastructure that makes this innovation possible. We are seeking a Technical Program Management Manager to lead core infrastructure programs across DGX Cloud, including network, storage, trust services, security, break/fix operations, and telemetry. This role manages a team of TPMs responsible for bringing structure, operational rigor, and cross-functional alignment to infrastructure programs that keep DGX Cloud resilient, scalable, and customer-ready.

Requirements

  • More than 12 overall years in technical program management, infrastructure program management, or similar roles, including upwards of 3 years directing or supervising TPMs.
  • Experience managing infrastructure programs in domains such as networking, storage, security, trust services, observability, telemetry, or cloud operations.
  • Strong ability to manage priorities, dependencies, risks, and execution plans across multiple engineering teams.
  • Experience building TPM operating rhythms, including status reviews, paths for handling blocking issues, tracking critical achievements, and leadership-ready updates.
  • Working knowledge of cloud infrastructure, distributed systems, or large-scale platform operations.
  • Strong communication skills with the ability to translate complex infrastructure work into clear program status, risks, and decisions.
  • Bachelor’s or Master’s degree in Computer Science, Engineering, or related field, or equivalent experience.

Nice To Haves

  • Experience supporting infrastructure for AI/ML platforms, GPU clusters, or large-scale cloud services.
  • Background with observability and telemetry tools such as Grafana, Prometheus, or similar platforms.
  • Experience with security, trust, compliance, or reliability programs in cloud infrastructure environments.
  • Track record improving operational processes for break/fix, incident response, or infrastructure readiness.
  • Strong technical judgment and ability to partner closely with engineering leaders while developing TPM talent.

Responsibilities

  • Lead and nurture a team of Technical Program Managers engaged in DGX Cloud core infrastructure projects.
  • Propel progress across network, storage, trust services, security programs, telemetry, and break/fix operational workstreams.
  • Partner with engineering, product, operations, security, and cloud provider teams to define priorities, achievements, dependencies, and delivery plans.
  • Build clear operating rhythms for infrastructure planning, managing blocking issues, risk tracking, and cross-functional decision-making.
  • Improve access to infrastructure health, delivery status, blockers, and program risks through practical metrics, dashboards, and reporting.
  • Coordinate break/fix and operational readiness programs that improve reliability, response time, and customer impact management.
  • Support continuous improvement across TPM practices, helping the team standardize planning, execution, and communication across DGX Cloud infrastructure.

Benefits

  • equity
  • benefits
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service