About The Position

NVIDIA's deep learning platforms lead innovation, significantly influencing multiple fields and embraced by top academic institutions, startups, and major Internet companies worldwide. We're looking for a seasoned and highly skilled Principal Technical Program Manager (TPM) to join our NVIDIA DGX Cloud team. This is an exciting opportunity for a passionate, driven, and creative individual to provide outstanding value to our DGX Cloud customers. We are seeking a TPM who has deep knowledge of observability, systems telemetry, and cloud infrastructure operations. You will play a key role collaborating with hardware/software supply teams, DGXC operations, and external Cloud Service Providers (CSPs and NCPs). Together, you will develop unified telemetry mentorship and confirm operational readiness worldwide.

Requirements

  • 12+ years of technical program management experience, specifically driving the planning and execution of large-scale engineering, cloud infrastructure, and observability programs within a matrixed organization.
  • Extensive practical experience in managing cloud infrastructure, ideally acquired from employment at a leading Cloud Service Provider (CSP).
  • Proven ability to act as the interface between cross-functional organizations, effectively managing complex feedback loops and resolving misaligned requirements.
  • Expert-level proficiency with Jira, Smartsheet, or similar program management tools, with the ability to confidently guide engineering teams on their effective use within an Agile framework.
  • Outstanding strategic and tactical thinking abilities, coupled with a strong capacity to build consensus and drive program success across diverse business units.
  • Excellent communication and technical presentation skills, particularly for executive audiences.
  • BS or MS in Electrical Engineering or Computer Science, or equivalent experience.

Nice To Haves

  • Comprehensive knowledge of NVIDIA architectures and interconnects, encompassing deployment, bring-up, and telemetry requirements for GPUs, NVLink, and InfiniBand.
  • Experience with technologies like Open Telemetry (OTel), Grafana, Warpstream, VictoriaMetrics, Loki.
  • Familiarity with cloud platform architecture, cloud-native services, and Kubernetes.
  • A highly enthusiastic, upbeat, responsive, and passionate individual who actively identifies process improvement opportunities and guides teams through ambiguity.

Responsibilities

  • Work closely with Engineering, Infrastructure, and Software teams.
  • Lead important programs focused on telemetry and data center fleet health & management.
  • Develop core capabilities for DGX Cloud.
  • Ensure operations and advanced tenants receive the telemetry required to troubleshoot, debug, and manage AI infrastructure effectively.
  • Establish a balanced feedback loop between DGX Cloud and other organizations at NVIDIA to align and unify telemetry requirements and mentorship for external partners.
  • Drive the end-to-end telemetry lifecycle for upcoming NVIDIA Cloud Providers (NCPs), ensuring requirements are committed, delivered, and ingested into a centralized telemetry platform to enable DGXC operations.
  • Participate in the early product lifecycle (Day -1 / Day 0) to examine the Plan of Record (POR) for new silicon, systems, firmware, and software architectures (example: VR) to ensure telemetry requirements for advanced tenants are coordinated.
  • Collaborate across technical domains (including NVLink, InfiniBand, SpectrumX, GPU, CPU, and DPU or equivalent experience) to establish standard telemetry operations mentorship across NVIDIA.
  • Drive a program for NVIDIA’s attestation platform to verify device integrity, authenticity, and trust across the accelerated computing ecosystem.
  • Provide leadership and mentorship to the DGXC TPM organization, driving process improvements.
  • Actively improve day-to-day efficiency by demonstrating and building AI tools (e.g., incorporating Claude Co-Work, NotebookLM, Glean, and building NemoClaw agents) to automate manual workflows like Jira management.
  • Develop and accomplish a robust communication strategy to ensure cross-functional visibility on overall program progress, including presenting regularly to NVIDIA's executive leadership team.

Benefits

  • Highly competitive salaries
  • Comprehensive benefits package
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service