In this high-impact staff-level role, you will architect, design, and implement enterprise-scale infrastructure solutions supporting Web, Mobile, Backend, and Data engineering teams, while providing technical leadership across cross-functional groups. You will define and drive adoption of reliability standards, architectural patterns, and engineering best practices across the organization, working closely with engineering and security leadership. You will lead performance optimization initiatives, implementing sophisticated monitoring strategies and leveraging advanced analytics to ensure exceptional system reliability and performance at scale. You will design and implement comprehensive automation frameworks for infrastructure provisioning, configuration management, and deployment processes, focusing on efficiency and scalability. You will serve as the technical authority for incident management, establishing robust incident response frameworks, leading cross-functional response efforts, and driving systematic improvements through detailed post-incident analysis. You will architect and implement enterprise-wide incident response strategies, including sophisticated playbooks and multi-tier escalation procedures aligned with business continuity requirements. You will partner with engineering leadership to drive reliability improvements through advanced automated testing frameworks, fault-tolerant architectures, and comprehensive disaster recovery strategies. You will provide technical mentorship and leadership to the broader engineering organization while contributing to the strategic direction of the SRE practice.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
No Education Listed