Principal Technical Program Manager

Microsoft•Redmond, WA

About The Position

The CO+I AI Delivery team is focused on delivering various platform services to our global AI customers. This role will require deep partnership throughout all of Microsoft as these programs will be on the leading edge of innovation and will include driving pilots and other experiments that shape our ongoing commitment to deliver AI platform services rapidly at cloud scale. We are part of Cloud Operations + Innovation (CO+I), the team behind one of the world’s largest cloud infrastructures, responsible for powering all Microsoft online products and services as well as powering Microsoft’s “Cloud First” mission. CO+I is focused on growth, efficiency, and delivering a trusted experience to customers and partners worldwide. We are seeking a highly capable Principal Technical Program Manager (TPM) to drive AI Deployment Acceleration across complex, large scale environments. In this role, you will own and orchestrate end to end technical programs that accelerate the delivery of AI compute, platforms, and services—from concept through production at global scale. You will operate at the intersection of infrastructure, hardware, software platforms, and operations, working closely with engineering, supply chain, data center, and business stakeholders to reduce cycle time, eliminate blockers, and deliver AI capacity faster and more predictably. This role requires strong technical depth, operational rigor, and the ability to influence across organizational boundaries without direct authority. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day

Requirements

Bachelor's Degree AND 6+ years experience in engineering, product/technical program management, data analysis, or product development
OR equivalent experience.
3+ years of experience managing cross-functional and/or cross-team projects.
Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Nice To Haves

Proven experience leading complex, cross‑team technical programs with significant infrastructure or platform components.
Strong technical foundation in one or more of the following:
Cloud infrastructure and distributed systems
Large‑scale datacentre delivery projects
Hardware‑software integrations (compute, networking, storage, power, cooling)
Demonstrated ability to manage execution in ambiguous, fast‑moving environments.
Excellent written and verbal communication skills, with experience presenting to senior leadership
Experience delivering or scaling AI, HPC, or GPU‑based platforms in production environments.
Familiarity with data center operations, hardware lifecycle management, or global deployment programs.
Experience driving cycle‑time reduction or large‑scale process transformation.
Background working with hyperscale cloud platforms or enterprise AI services.

Responsibilities

Own end‑to‑end technical programs focused on accelerating AI deployment timelines, from requirements through live production.
Drive execution across multiple parallel workstreams, ensuring alignment on scope, milestones, dependencies, risks, and outcomes.
Establish clear success metrics and mechanisms to track delivery, quality, and velocity.
Document appropriately all artifacts during deliberative processes and established consensus
Partner deeply with hardware engineering, software engineering, infrastructure, networking, data center operations, and supply chain teams to unblock execution.
Act as the central point of coordination across highly interdependent teams and external partners.
Influence decision‑making with data, technical insight, and strong executive communication.
Develop deep working knowledge of AI deployment architectures, including compute (GPU/accelerators), networking, storage, racks, power, cooling, and platform readiness.
Identify technical risks early and drive mitigation strategies across hardware, firmware, software, and operational domains.
Translate complex technical concepts into clear, actionable plans for both technical and non‑technical stakeholders.
Identify bottlenecks and non‑value‑added activities across the deployment lifecycle and drive improvements to reduce time‑to‑live.
Define and implement repeatable deployment playbooks, standard operating procedures, and automation opportunities.
Leverage retrospectives and data to continuously improve deployment velocity and reliability.
Ensure platform, tooling, monitoring, and operational readiness for AI workloads prior to production cutover.
Partner with service owners to validate scalability, resiliency, security, and operational support models.
Provide crisp, data‑driven updates to senior leadership on progress, risks, mitigation plans, and business impact.
Clearly articulate tradeoffs and recommend paths forward in high‑ambiguity environments.