NVIDIA's deep learning platforms are at the forefront of innovation, profoundly impacting various fields and widely adopted by leading academic institutions, startups, and major Internet companies globally. We're seeking an accomplished and highly skilled Technical Program Manager (TPM) to join our NVIDIA DGX Cloud team. This is an exciting opportunity for a passionate, results-oriented, and creative individual to deliver exceptional value to our DGX Cloud customers. We are specifically looking for a TPM with extensive experience in cloud infrastructure bring-up with external partners. You'll be instrumental in partnering with emerging Nvidia Cloud Providers (NCPs) and engineering teams internally to help build AI capacity and infrastructure across the globe What you'll be doing: As a DGX Cloud Technical Program Manager, you'll be a key partner to our Engineering, Infrastructure, Software teams and their leadership, driving critical programs related to AI capacity enablement and management. You'll play a pivotal role in developing and maturing foundational capabilities and processes for DGX Cloud, spanning critical areas such as cluster/capacity bring-up including CPU, storage, networking and compute requirements to support GPUs. This is a dynamic, fast-paced environment where TPMs are expected to apply fungible skillsets to a range of high-impact programs across DGX Cloud. Collaborating closely with storage engineering and network engineering teams to define and communicate requirements to CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Providers). Drive alignment and a POR for capacity blocks based on workload needs. Drive early engagement with CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Providers) to understand their managed storage, network solutions and influence alignment with NVIDIA Cloud roadmap Gathering technical requirements, developing comprehensive roadmaps, establishing clear milestones, and ensuring adherence to our Product Lifecycle (PLC) process. Managing ongoing capacity operations and the engineering engagement with CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Provider) partners, collaborating closely with an SRE lead. Focus on availability, maintenance and other critical performance indicators. Partner closely within NVIDIA to understand workload requirements, related HW and infra needs, including speeds/feeds to optimize and infrastructure readiness with CSP (Cloud Service Providers) and NCP’s (NVIDIA Cloud Providers) Leveraging Jira and other program management platforms to instill rigor and structure in the management of engineering deliverables. Identifying and driving opportunities to onboard the adoption of third-party and in-house cloud infrastructure solutions for deployments, support, security, compliance and observability across DGX Cloud Establishing key performance indicators (KPIs) and quantitatively demonstrating the value and impact delivered by your programs. Proactively identifying, resolving, and mitigating risks and issues that could affect scope, schedule, and quality across all program aspects. Cultivating a culture of continuous improvement, consistently identifying opportunities for process enhancements within our cloud infrastructure operations.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level