As a Principal Site Reliability Engineer at Kandji, you will play a critical role in ensuring the reliability, scalability, and performance of our platform. In this strategic position, you'll work cross-functionally to build and evolve the systems, tools, and processes that keep our services resilient and performant-especially as we scale to meet the demands of a growing customer base. You'll bring a deep understanding of distributed systems, incident management, observability, and automation. Your experience with AWS, Kubernetes, and Infrastructure-as-Code (Terraform preferred) will help drive efforts to proactively identify and eliminate reliability risks, reduce toil through automation, and establish engineering best practices across teams. This role provides the opportunity to shape the culture and architecture of reliability at Kandji, partnering closely with engineering, infrastructure, and product teams to build systems that are not only functional, but fault-tolerant and maintainable.