Cerebras Systems builds the world's largest AI chip, 56 times larger than GPUs. This architecture allows Cerebras to deliver industry-leading training and inference speeds; over 10 times faster than GPU-based hyperscale cloud inference services. This order of magnitude increase in speed is transforming the user experience of AI applications, unlocking real-time iteration and increasing intelligence via additional agentic computation. Cerebras works with the leading model labs, global enterprises, and cutting-edge AI-native startups. OpenAI recently announced a multi-year partnership with Cerebras, to deploy 750 megawatts of scale, transforming key workloads with ultra high-speed inference. Cerebras serves billions of inference tokens per day to customers like Cognition, AlphaSense, Mistral, IFM, Block, and others, running on the world's largest AI accelerators. Capacity is the heartbeat of this business: every model deployment, every customer commit, every SLO breach lands on a finite set of wafers, GPUs, and datacenter racks. The Capacity TPM owns end-to-end capacity planning, allocation, and reporting for the Inference Service org.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed