The Common Services organization at CoreWeave is responsible for the shared platforms, APIs, and foundational services that power our AI cloud products and internal engineering teams. From authentication and authorization to core platform primitives and developer experience tooling, this organization ensures that the rest of CoreWeave can build, ship, and operate reliably at scale. As Reliability Lead, Common Services , you will establish and lead the Reliability Engineering and production operations practice for this organization. You’ll partner closely with engineering leaders and teams across Common Services to define how we build, release, monitor, and operate critical services—raising the bar on reliability, availability, and operational excellence across the board. As Reliability Lead, Common Services , you will be responsible for defining the reliability strategy, processes, and standards for the Common Services portfolio and driving consistent, high-quality operational practices across multiple teams. You’ll monitor production incidents within Common Services, and work directly with your partner teams to design systems that are reliable, observable, and supportable. Your day-to-day will blend hands-on technical work and cross-functional leadership to drive continuous improvement of Common Services production operations.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed