Quality, reliability, and uptime are foundational to scaling Cerebras systems. We are seeking an engineer to define and build our prognostics and health monitoring (PHM) capability—developing frameworks to monitor, assess, and predict hardware health across our fleet. In this role, you will transform telemetry and operational data into actionable insights and automated responses, enabling early detection of degradation, accurate failure prediction, and proactive actions to keep systems highly available, performant, and resilient. This is a highly cross-functional role spanning reliability engineering, data science, and system software, with broad influence across hardware, software, and fleet operations.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
Associate degree
Number of Employees
251-500 employees