Technical Program Manager, AI Factory Infrastructure

NVIDIA•Santa Clara, CA

About The Position

NVIDIA's AI Factory Infrastructure team develops global reference designs, de-risks technologies needed for the next generation of compute and network products and builds AI infrastructure at scale to validate solutions at scale. As a TPM on the team, you’ll be responsible for leading teams that are highly multi-functional, including hardware, software and facility infrastructure teams, to deliver solutions and infrastructure. This role offers a truly unique blend of developing future solutions and products with building infrastructure at scale.

Requirements

Outstanding long-term planning and execution skills to carry our data center lifecycle planning, including large-scale AI factory buildouts and expansions.
Experience managing end-to-end data center deployments for high-density AI infrastructure, including commissioning, readiness reviews, turn-up, and operational handoff.
Demonstrated ability to coordinate across colocation providers, general contractors, utilities, and OEMs to deliver complex electrical and mechanical scope (e.g., at 100MW+ campus scale).
Strong technical and program leadership across power delivery, liquid cooling, networking, and compute/platform teams to define acceptance criteria and ensure performance and reliability targets are met.
12+ years of experience providing program and project management leadership for data center projects covering construction of mechanical, electrical, and plumbing with large-scale server, storage, and network deployments.
BS or MS degree in Engineering (or equivalent experience).

Nice To Haves

In-depth knowledge of infrastructure (hardware and software) data center facilities infrastructure (electrical and mechanical) technologies.
Familiarity with NVIDIA’s AI compute technology stack and ability to translate platform requirements into data center infrastructure designs (power delivery, liquid cooling, space, and network topology) at scale.
Experience with colocation data center environments

Responsibilities

Collaborate with product owners and technical leads to identify and collect requirements for next-generation AI Factories.
Build, supervise, and complete long-term programs including schedules, resourcing, and checkpoints.
Work with data center and hardware teams to find creative solutions to hard problems, and co-develop solutions and mitigation strategies.
Lead planning with key internal partners on capacity demands with engineering roadmaps and data center expansions.
Own end-to-end delivery of 100MW+ AI factory data center deployments, from construction readiness through commissioning, turn-up, and handoff to operations.
Coordinate general contractors, colocation providers, utilities, and OEMs to align electrical/mechanical scope, long-lead equipment, and site logistics for large-scale AI cluster deployments.
Drive integrated readiness reviews and acceptance criteria across power, liquid cooling, networking, and platform/hardware teams to ensure performance and reliability targets are met for AI factory applications.
Develop program plans for government grants and initiatives.
Translate program requirements into Basis Of Design documents
Bring together team members and foster a collaborative approach to delivery while holding team members accountable to action items and timelines.