We are seeking a highly motivated and experienced Infrastructure and Platform Reliability Technical Program Manager (TPM) to lead Fleet Reliability initiatives across the full system lifecycle from Day 0 provisioning through Day 2 steady-state operations. This role is highly cross-functional and will partner closely with Compute, Networking, Operations, Data Center teams to drive measurable improvements in fleet reliability, availability, and operational excellence. This TPM will operate as the central owner for reliability programs, aligning teams on priorities, defining success metrics, driving execution, and ensuring sustained improvements at scale. The Technical Program Manager will Own end-to-end fleet reliability outcomes across Day 0 (provisioning, validation, bring-up) through Day 2 (steady-state operations, incident reduction, lifecycle management).
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
501-1,000 employees