About the position
We are seeking an experienced Senior Site Reliability Engineer to join our team and contribute to the automation, performance, and reliability of our cloud-based infrastructure. As part of the compute teams at Zapier, you will be responsible for owning and improving our infrastructure, automation, and tooling. This role offers the opportunity to make a significant impact and take our infrastructure to the next level in a fast-growing and profitable startup environment. The ideal candidate should have a strong background in systems administration, systems engineering, or software development, with expertise in Site Reliability Engineering or DevOps. Additionally, proficiency in cloud-based infrastructure, infrastructure as code tools, and programming languages like Python or Go is required. Effective communication skills and a commitment to Zapier's values are also essential for success in this role.
Responsibilities
- Design and deploy AWS infrastructure using infrastructure as code tools (Terraform, Helm, etc) across multiple accounts
- Contribute to Kubernetes clusters (EKS) and serverless functions (Lambda)
- Evaluate and recommend new tools and technologies to the organization
- Partner with teams to solve infrastructure and design problems
- Build services to integrate systems, process high-traffic workloads, and perform migrations
- Apply SRE principles to identify and address contributing factors to unreliability
- Improve application reliability using a software engineering approach to operations
- Develop internal tools and systems to help engineering teams ship better software
- Impact every engineering team in the organization and use a broad set of technologies
- Maintain relationships and communicate effectively with teams
- Build new features and services to support teams and improve site reliability
- Solve problems and learn from failures with the support of the team
- Automate solutions to problems rather than relying on manual effort
Requirements
- 4 years of experience in SaaS companies in systems administration, systems engineering, or software development
- At least 2 years of experience in Site Reliability Engineering or DevOps
- Experience in designing or maintaining highly available, cloud-based infrastructure in AWS or another cloud offering
- Familiarity with infrastructure as code tools and best practices for reliability and observability
- Proficiency in coding with languages like Python or Go
- Ability to solve complex systems challenges and improve performance
- Strong communication skills, both written and verbal
- Alignment with Zapier's values and ability to thrive in a collaborative setting
Benefits
- Competitive compensation in the technology sector
- Equitable pay practices based on competency
- Simple and transparent pay structure
- Pay ranges for the role specified
- Compensation package finalized based on experience and competencies
- Total Rewards program
- Non-standard application process promoting inclusion and equity
- Prompt communication regarding application status
- Equal opportunity employer
- Commitment to diversity, inclusion, belonging, and equity
- Consideration of applicants with criminal histories
- Reasonable accommodations for individuals with disabilities
- All-remote company with restrictions on permanent work locations