We're Celonis, the global leader in Process Intelligence technology and one of the world's fastest-growing SaaS firms. We believe there is a massive opportunity to unlock productivity by placing AI, data and intelligence at the core of business processes - and for that, we need your help. Care to join us? The Team As a member of our Reliability Engineering team, you will play a critical role in ensuring the health, performance, and resilience of our platform. The team applies advanced software engineering and Site Reliability Engineering (SRE) principles to drive system reliability, scalability, and operational excellence across the organization. The Role Join a highly technical, collaborative, and innovation-driven team that blends Site Reliability Engineering with modern Software Engineering practices to build resilient and scalable systems. Lead reliability efforts for a fleet of 80+ FedRAMP-compliant microservices running on Kubernetes, applying SRE principles to drive observability, automation, and incident prevention. Develop and enforce SLOs, SLAs, and error budgets to drive reliability-focused development. Provide mentorship and technical leadership across the SRE and engineering teams. Own high-priority application incident escalations, performing deep technical analysis and restoration within defined SLOs, while continuously improving detection and response mechanisms. Engineer solutions to enhance the availability, latency, and performance of production services—automating manual processes to eliminate toil and scale operational efficiency. Collaborate closely with platform and application engineering teams to conduct post-incident reviews, extract insights, and implement systemic changes that improve overall reliability.