The Principal Site Reliability Engineer partners with development teams by designing availability and resiliency patterns in applications and infrastructure. This role supports the company’s commitment to risk management and protecting the integrity and confidentiality of systems and data. The engineer will build automation and tooling around application management, such as deployments, configuration changes and disaster recovery scenarios. They will also design, implement and evangelize Observability and monitoring systems to proactively detect problems and identify causes. Evaluating application capacity on a continuous basis to provide stats to Product/Business teams and recommending efficient scaling paths for future needs is also a key function. Identifying performance bottlenecks and collaborating with cross-functional teams to troubleshoot and resolve issues is expected. The role serves as a technical liaison for the application, providing documents and runbooks to Level 1 and Level 2 teams. Participation in a 24x7 on-call rotation is required. The engineer will be a champion of excellent processes, taking initiative in developing repeatable patterns and standard, reusable work across teams. They will work directly with application development teams to provide feedback and technical requirements to the software development lifecycle, implementing best-practice microservice design patterns and other modern software development approaches. Understanding and supporting the adoption of best-practice microservice design patterns and other modern software reliability approaches and techniques is crucial. The role requires being a thought leader, a senior point of expertise on site reliability engineering issues, industry trends, and developing technologies, as well as a role model and coach/mentor to other team members.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Principal