Capacity TPM

Cerebras Systems•Toronto, ON

About The Position

Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. This architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation. Cerebras works with the leading model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. Cerebras serves billions of inference tokens per day to customers like Cognition, AlphaSense, Mistral, IFM, Block, and others, running on the world's largest AI accelerators. Capacity is the heartbeat of this business: every model deployment, every customer commit, every SLO breach lands on a finite set of wafers, GPUs, and datacenter racks. The Capacity TPM owns end-to-end capacity planning, allocation, and reporting for the Inference Service org.

Requirements

5+ years of TPM, technical program management, or product operations experience in cloud infrastructure, large-scale ML serving, or hyperscaler capacity planning.
Comfort with the inference serving stack: model replicas, batching, prefill/decode, KV cache, GPU and accelerator scheduling.
Strong data fluency: SQL, Grafana, basic Python or Flux to pull your own numbers without waiting for an analyst.
Track record of running a recurring cross-functional ritual involving senior engineers and LT.
Direct experience AI accelerator fleet operations such as Habana, TPU pods, Inferentia, Trainium.
Familiarity with Kubernetes, HAProxy, InfluxDB, Loki, and the FastAPI-based control plane on AWS EKS.
Hardware supply chain familiarity (NVIDIA NVL72, DGX delivery cycles, datacenter colo logistics).

Responsibilities

Run the Monday capacity review (Inference Platform, Cluster Mgmt, Customer PMs, Hardware Procurement, Datacenter Ops).
Update the capacity model after major events: new customer commits, hardware deliveries, postmortems, model launches.
Translate sales asks into "yes by [date], no, or yes if we drop X" within 2 business days.
Maintain Jira EPICs and Confluence pages that the broader org uses to plan against.
Drive continuous improvement, stakeholder adoption of new capacity management platform.
Drive new datacenter bringups, cluster upgrade and other related tasks in close partnership with deployment and AIOps team.