Senior / Staff Technical Program Manager - Datacenter Capacity Delivery (E2E)

Cerebras Systems•Sunnyvale, CA

5d•Remote

About The Position

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. Our novel wafer-scale architecture provides the AI compute power of dozens of GPUs on a single chip, with the programming simplicity of a single device. This approach allows Cerebras to deliver industry-leading training and inference speeds and empowers machine learning users to effortlessly run large-scale ML applications, without the hassle of managing hundreds of GPUs or TPUs. Cerebras' current customers include top model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras , to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. Thanks to the groundbreaking wafer-scale architecture, Cerebras Inference offers the fastest Generative AI inference solution in the world, over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation. The Role The DC Delivery E2E TPM is the single-threaded owner for delivering data center capacity from forecast → site strategy → design → construction → infrastructure readiness → go-live. You will operate as the SSOT (Single Source of Truth) for delivery milestones, risks, and capacity outcomes while orchestrating cross-functional execution across internal teams and external partners. This is a frontier-scale role where ambiguity is high, timelines are compressed, and stakes are critical to company growth.

Requirements

12-15+ years in mission-critical facilities or data center operations
Experience managing multi-site, vendor-heavy environments
Strong expertise in electrical and mechanical systems
Proven track record in improving uptime and performance
Ability to operate in high-growth, ambiguous environments with limited structure.
Strong executive presence and ability to influence without authority.
Extreme ownership mindset — you treat capacity delivery as a personal SLA.
Ability to move fast without breaking systems, balancing urgency with rigor.
Comfort operating at the intersection of hardware, real estate, and software scaling demands.
Clear, structured communicator who can turn chaos into executable plans.
Bias for action with a track record of delivering results under pressure.

Nice To Haves

Experience at hyperscalers (Google, Meta, Microsoft, AWS) or neo-cloud / AI infra companies (CoreWeave, Lambda, etc.).
Familiarity with high-density AI workloads (liquid cooling, >30kW racks, GPU clusters).
Experience with: Utility engagement and power delivery constraints, Long-lead supply chain planning (transformers, switchgear, chillers), Commissioning and data center handover processes

Responsibilities

End-to-End Capacity Delivery: Own delivery of AI-optimized data center capacity (colo, build-to-suit, retrofits, and owned facilities) from pre-contract planning through operational readiness.
Deliver MW-scale infrastructure aligned to aggressive GPU/AI system deployment targets.
Drive clarity from ambiguity—translate high-level demand signals into executable delivery programs.
Program Structuring & Execution: Decompose complex build programs into workstreams with clear owners, milestones, and deliverables.
Build integrated plans spanning real estate, power/energy, design, procurement, construction, and deployment.
Establish critical path visibility and aggressively manage schedule compression.
Cross-Functional Leadership: Orchestrate execution across: Real estate & site selection, Power & energy strategy (utilities, PPAs, interconnects), Data center design (MEP, liquid cooling, high-density racks), Supply chain & long-lead equipment procurement, Construction & commissioning, Infrastructure deployment (rack/cluster install), Networking & backbone connectivity, Security, compliance, and operations readiness.
Act as the primary interface with colocation providers, EPCs, utilities, and key vendors.
Risk, Cost & Governance: Identify and drive resolution of critical risks, constraints, and blockers across power, equipment, permitting, and supply chain.
Own and maintain program budgets, CapEx forecasts, and capital allocation narratives.
Provide structured updates and escalation paths to executive leadership.
Demand & Capacity Planning Integration: Partner with capacity planning, AI infrastructure, and finance teams to translate model demand into site-level capacity strategies.
Align build plans with power availability, network topology, and hardware rollout schedules.
Continuously optimize for time-to-capacity and cost-per-MW / cost-per-GPU deployed.
Operational Excellence: Drive E2E improvements in delivery through: Standardization of build and commissioning processes, Implementation of program tooling and dashboards, Post-mortems and lessons learned loops.
Establish scalable mechanisms to support rapid global expansion.
Communication & Leadership: Serve as SSOT for program health, milestones, and risks.
Deliver concise, high-signal updates to senior executives.
Operate effectively across distributed teams with up to 50% travel.