Principal TPM -AI Infrastructure

Oracle•Seattle, WA

12d•$90,100 - $199,500•Onsite

About The Position

The AI Infrastructure GPU Operations Team drives deployment planning, execution governance, operational readiness, reliability, and business rhythm for OCI's rapidly expanding GPU infrastructure portfolio. As Principal Technical Program Manager, you will lead cross-functional programs that connect engineering, platform, operations, business, finance, observability, SRE, network, and leadership teams across complex GPU operations initiatives. You will own operating mechanisms for regional deployment readiness, GPU fleet health, milestone tracking, executive reporting, incident and change governance, risk management, and operational handoff across multiple concurrent GPU operations programs. This role requires strong program discipline, business analytics capability, and the ability to turn ambiguous technical and operational inputs into clear priorities, metrics, decisions, and action plans. You will also improve the way the organization scales by strengthening dashboards, telemetry, documentation, onboarding, playbooks, repeatable processes, and the practical use of AI to improve operations productivity. The ideal candidate brings crisp communication, strong ownership, and pragmatic simplification to high-visibility GPU operations programs where disciplined execution, customer impact, and measurable reliability outcomes matter. You are a structured, data-driven program leader who values simplicity, scalability, reliability, and clear operational mechanisms. You thrive in collaborative environments, communicate crisply with senior stakeholders, and drive consistent execution through ownership, metrics, and disciplined follow-through. You combine strategic clarity with enough technical and operational depth to help teams deliver reliable OCI AI Infrastructure GPU Operations while continuously improving the processes, telemetry, and automation that support it.

Requirements

Strong program discipline
Business analytics capability
Ability to turn ambiguous technical and operational inputs into clear priorities, metrics, decisions, and action plans
Crisp communication
Strong ownership
Pragmatic simplification
Structured, data-driven program leadership
Values simplicity, scalability, reliability, and clear operational mechanisms
Thrives in collaborative environments
Communicates crisply with senior stakeholders
Drives consistent execution through ownership, metrics, and disciplined follow-through
Combines strategic clarity with technical and operational depth

Responsibilities

Lead cross-functional programs connecting engineering, platform, operations, business, finance, observability, SRE, network, and leadership teams across complex GPU operations initiatives.
Own operating mechanisms for regional deployment readiness, GPU fleet health, milestone tracking, executive reporting, incident and change governance, risk management, and operational handoff across multiple concurrent GPU operations programs.
Improve organizational scaling by strengthening dashboards, telemetry, documentation, onboarding, playbooks, repeatable processes, and the practical use of AI to improve operations productivity.
Drive disciplined execution, customer impact, and measurable reliability outcomes in high-visibility GPU operations programs.