Technical Program Manager, AI Infrastructure

Cerebras SystemsSunnyvale, CA
2h

About The Position

Be part of the team that builds and operates the world's fastest AI infrastructure for training and inference. Your role as a TPM will help accelerate data center buildouts to meet the explosive demand for our inference service platform.

Requirements

  • Experience leading large, cross-functional infrastructure programs.
  • Experience with AI/ML, HPC, or accelerator-based infrastructure.
  • Strong understanding of data center power and cooling fundamentals.
  • Experience installing and managing network, storage, and compute devices.
  • Proven ability to define and operationalize metrics.
  • Strong written and executive-level communication skills.
  • Experience working with colocation providers and facilities teams.
  • Background in incident management, reliability, or service operations.

Nice To Haves

  • Experience running network operations teams is a plus.

Responsibilities

  • Own end-to-end technical programs for multiple data center buildouts, coordinating with partners, contractors, and internal teams.
  • Drive facility site readiness for power and cooling for Cerebras Wafer-Scale Engine systems.
  • Coordinate equipment delivery and manage vendor accountability for schedules and quality related to rack integration and inter-rack cabling.
  • Act as the single-threaded owner across internal partners: Hardware & Systems Engineering, Network & Storage Engineering, AI Cloud Infrastructure & Operations.
  • Enforce handover criteria between site completion, equipment deployment, and operations.
  • Own overall schedule tracking, risk identification, and mitigation, creating clear visibility for leadership.
  • Establish program governance, risk tracking, and RACI clarity.
  • Present program status, metrics, and operational risks to senior leadership.
  • Drive partner accountability on contractual milestones and commercial commitments.
  • Document repeatable processes and implement them to scale across future data centers.
  • Partner on installation, commissioning, change management, and break/fix workflows.
  • Lead incident reviews and postmortems, ensuring corrective actions are completed.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service