Principal TPM -AI Infrastructure

OracleSeattle, WA
$90,100 - $199,500Onsite

About The Position

The AI Infrastructure GPU Operations Team drives deployment planning, execution governance, operational readiness, reliability, and business rhythm for OCI's rapidly expanding GPU infrastructure portfolio. As Principal Technical Program Manager, you will lead cross-functional programs that connect engineering, platform, operations, business, finance, observability, SRE, network, and leadership teams across complex GPU operations initiatives. You will own operating mechanisms for regional deployment readiness, GPU fleet health, milestone tracking, executive reporting, incident and change governance, risk management, and operational handoff across multiple concurrent GPU operations programs. This role requires strong program discipline, business analytics capability, and the ability to turn ambiguous technical and operational inputs into clear priorities, metrics, decisions, and action plans. You will also improve the way the organization scales by strengthening dashboards, telemetry, documentation, onboarding, playbooks, repeatable processes, and the practical use of AI to improve operations productivity. The ideal candidate brings crisp communication, strong ownership, and pragmatic simplification to high-visibility GPU operations programs where disciplined execution, customer impact, and measurable reliability outcomes matter. You are a structured, data-driven program leader who values simplicity, scalability, reliability, and clear operational mechanisms. You thrive in collaborative environments, communicate crisply with senior stakeholders, and drive consistent execution through ownership, metrics, and disciplined follow-through. You combine strategic clarity with enough technical and operational depth to help teams deliver reliable OCI AI Infrastructure GPU Operations while continuously improving the processes, telemetry, and automation that support it.

Requirements

  • Strong program discipline
  • Business analytics capability
  • Ability to turn ambiguous technical and operational inputs into clear priorities, metrics, decisions, and action plans
  • Crisp communication
  • Strong ownership
  • Pragmatic simplification
  • Structured, data-driven program leadership
  • Values simplicity, scalability, reliability, and clear operational mechanisms
  • Thrives in collaborative environments
  • Communicates crisply with senior stakeholders
  • Drives consistent execution through ownership, metrics, and disciplined follow-through
  • Combines strategic clarity with technical and operational depth

Responsibilities

  • Lead cross-functional programs connecting engineering, platform, operations, business, finance, observability, SRE, network, and leadership teams across complex GPU operations initiatives.
  • Own operating mechanisms for regional deployment readiness, GPU fleet health, milestone tracking, executive reporting, incident and change governance, risk management, and operational handoff across multiple concurrent GPU operations programs.
  • Improve organizational scaling by strengthening dashboards, telemetry, documentation, onboarding, playbooks, repeatable processes, and the practical use of AI to improve operations productivity.
  • Drive disciplined execution, customer impact, and measurable reliability outcomes in high-visibility GPU operations programs.

Benefits

  • Flexible medical
  • Life insurance
  • Retirement options
  • Volunteer programs
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service