About The Position

NVIDIA's deep learning platforms lead innovation, significantly influencing multiple fields and embraced by top academic institutions, startups, and major Internet companies worldwide. We're looking for a seasoned and highly skilled Principal Technical Program Manager (TPM) to join our NVIDIA DGX Cloud team. This is an exciting opportunity for a passionate, driven, and creative individual to provide outstanding value to our DGX Cloud customers. We are seeking a TPM who has deep knowledge of observability, systems telemetry, and cloud infrastructure operations. You will play a key role collaborating with hardware/software supply teams, DGXC operations, and external Cloud Service Providers (CSPs and NCPs). Together, you will develop unified telemetry mentorship and confirm operational readiness worldwide.

Requirements

  • 12+ years of technical program management experience, specifically driving the planning and execution of large-scale engineering, cloud infrastructure, and observability programs within a matrixed organization.
  • Extensive practical experience in managing cloud infrastructure, ideally acquired from employment at a leading Cloud Service Provider (CSP).
  • Proven ability to act as the interface between cross-functional organizations, effectively managing complex feedback loops and resolving misaligned requirements.
  • Expert-level proficiency with Jira, Smartsheet, or similar program management tools, with the ability to confidently guide engineering teams on their effective use within an Agile framework.
  • Outstanding strategic and tactical thinking abilities, coupled with a strong capacity to build consensus and drive program success across diverse business units.
  • Excellent communication and technical presentation skills, particularly for executive audiences.
  • BS or MS in Electrical Engineering or Computer Science, or equivalent experience.

Nice To Haves

  • Comprehensive knowledge of NVIDIA architectures and interconnects, encompassing deployment, bring-up, and telemetry requirements for GPUs, NVLink, and InfiniBand.
  • Experience with technologies like Open Telemetry (OTel), Grafana, Warpstream, VictoriaMetrics, Loki.
  • Familiarity with cloud platform architecture, cloud-native services, and Kubernetes.
  • A highly enthusiastic, upbeat, responsive, and passionate individual who actively identifies process improvement opportunities and guides teams through ambiguity.

Responsibilities

  • Work closely with Engineering, Infrastructure, and Software teams.
  • Lead important programs focused on telemetry and data center fleet health & management.
  • Develop core capabilities for DGX Cloud.
  • Ensure operations and advanced tenants receive the telemetry required to troubleshoot, debug, and manage AI infrastructure effectively.
  • Establish a balanced feedback loop between DGX Cloud and other organizations at NVIDIA to align and unify telemetry requirements and mentorship for external partners.
  • Drive the end-to-end telemetry lifecycle for upcoming NVIDIA Cloud Providers (NCPs), ensuring requirements are committed, delivered, and ingested into a centralized telemetry platform to enable DGXC operations.
  • Participate in the early product lifecycle (Day -1 / Day 0) to examine the Plan of Record (POR) for new silicon, systems, firmware, and software architectures (example: VR) to ensure telemetry requirements for advanced tenants are coordinated.
  • Collaborate across technical domains (including NVLink, InfiniBand, SpectrumX, GPU, CPU, and DPU or equivalent experience) to establish standard telemetry operations mentorship across NVIDIA.
  • Drive a program for NVIDIA’s attestation platform to verify device integrity, authenticity, and trust across the accelerated computing ecosystem.
  • Provide leadership and mentorship to the DGXC TPM organization, driving process improvements.
  • Actively improve day-to-day efficiency by demonstrating and building AI tools (e.g., incorporating Claude Co-Work, NotebookLM, Glean, and building NemoClaw agents) to automate manual workflows like Jira management.
  • Develop and accomplish a robust communication strategy to ensure cross-functional visibility on overall program progress, including presenting regularly to NVIDIA's executive leadership team.

Benefits

  • Highly competitive salaries
  • Comprehensive benefits package
  • Equity
  • Benefits

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

Associate degree

Number of Employees

5,001-10,000 employees

© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service