Senior Site Reliability Engineer

Zip Co Limited-posted about 5 hours ago

$150,000 - $170,000/Yr

Full-time • Mid Level

Remote

1,001-5,000 employees

Resume

Match Score

Upload and Match ResumeTrack Jobs with Teal

Senior Site Reliability Engineer with deep expertise in optimizing system reliability, performance, and scalability across cloud environments (Azure, Kubernetes, Service Mesh). Proficient in defining, measuring, and improving Service Level Objectives (SLOs), managing error budgets, and automating toil to drive operational excellence in a blameless culture. Remote-first opportunity for US-based employees with the option to work in-person out of our Manhattan office. Start your adventure with Zip Join Zip’s Engineering function and put your name to solving fascinating challenges at scale in an agile, test-driven development environment. If you value good domain-driven design and enjoy delivering quality work at pace, you’ll be a great fit with the squads responsible for building cloud-native software applications that serve millions of customers and process billions of dollars in payments. We are seeking a seasoned leader with extensive senior leadership experience to spearhead our Site Reliability Engineering (SRE) initiatives and mentor our engineering team. This role requires a deep understanding of operational excellence, managing production risk, and the ability to lead reliability initiatives from inception to completion. Collaboration is key in our environment, so we need someone who excels in a team-oriented setting. As we aim to double our footprint this year, you will encounter complex challenges that demand innovative solutions and strategic insight to maintain and improve system reliability at scale. If you are passionate about driving infrastructure excellence and nurturing talent within a dynamic SRE team, we would love to hear from you. Interesting problems you’ll get to solve Work within an infrastructure that is capable of handling billions of dollars in transactions quickly and securely Collaborate with engineering teams to design and deploy highly reliable and scalable integrated solutions for Fortune 100 companies. Develop automated upgrade systems for a constantly evolving Azure architecture Maintain a complex event sourcing environment using CQRS principles Develop self-service tooling and automation (e.g., using Terraform, Atlantis, ArgoCD) to empower development teams to operate services within established reliability standards and reduce toil. Monitor for service health and create automatic recoveries using metrics-based canaries to ensure reliable code deployment What you’ll get in return Zip is a place where you’ll get out what you put in. The newness of our sector means we need to move at pace and embrace change, and our promise to you when you join the team is that you’ll feel empowered and trusted to make big things happen quickly. We want you to feel welcome and as though you have the support to be yourself, and care for yourself at work. Because it’s important to us that you make the most of the opportunities you’ll get to grow your skills and your career, and be surrounded by smart, friendly people and leaders that have your back. We think these are just some of the best things about being a Zipster. We will also offer you: Flexible working culture Incentive programs 20 days PTO every year Generous paid parental leave Leading family support policies Company-sponsored 401k match Learning and wellness subscription stipend Beautiful Union Square office with a casual dress code Industry-leading, employer-sponsored insurance for you and your dependents, with several 100% Zip-covered choices available The Pay Range for this position: $150,000-170,000 based on the industry benchmark for position, function, level and Zip's compensation strategies. However, actual base salary will depend on varying circumstances and individualized factors, such as job-related knowledge, skills, experience, and other objective business considerations. Subject to those same considerations, the total compensation package for this position may also include other elements, including a bonus and/or equity awards, in addition to a full range of medical, financial, and/or other benefits. Be a part of a team that reflects the diversity of our customers We pride ourselves on being a workplace that provides equal opportunities to people of all ages, cultural backgrounds, sexual orientations, gender identities, abilities, veteran status, and everything else that makes you unique. Equally, we’re committed to ensuring our recruitment processes are accessible and inclusive. Please let us know If there are any adjustments that need to be made to ensure you have a fair and equitable experience. And finally…get to know us Zip is a global ‘Buy Now, Pay Later’ company that gives our millions of customers simpler and fairer ways to pay. We are proud to be a global business built around our US and ANZ core markets working with merchant partners including Amazon, Best Buy, eBay and Uber. United by our mission, purpose and values - Customer First, Own It, Stronger Together & Change The Game - we are the next generation of payments, helping people across the globe to fearlessly take control of their financial future. We are Zip, and we are just getting started. Before you apply, give Zip a try -> rebrand.ly/check-zip-out Zip participates in the federal government’s E-Verify program

Work within an infrastructure that is capable of handling billions of dollars in transactions quickly and securely
Collaborate with engineering teams to design and deploy highly reliable and scalable integrated solutions for Fortune 100 companies.
Develop automated upgrade systems for a constantly evolving Azure architecture
Maintain a complex event sourcing environment using CQRS principles
Develop self-service tooling and automation (e.g., using Terraform, Atlantis, ArgoCD) to empower development teams to operate services within established reliability standards and reduce toil.
Monitor for service health and create automatic recoveries using metrics-based canaries to ensure reliable code deployment

10+ years of experience in a Site Reliability Engineering, Production Engineering, or equivalent role.
5+ years of experience working with Kubernetes or similar microservice architecture.
5+ years of experience working in an Azure environment
Proven experience defining and implementing Service Level Indicators (SLIs) and Service Level Objectives (SLOs) and managing error budgets.
Experience working in an agile environment and knowledge of agile practices
Experience with CI/CD systems preferably using Azure DevOps or GitHub Actions
Strong understanding of networking and routing protocols especially those involved in Service Mesh architectures
Experience incorporating AI tools such as ChatGPT, Cursor, Codex, or GitHub CoPilot into your day to day work.
Must be able to work in an on-call rotation with a focus on sustainable incident response and post-mortem analysis (blameless culture).

Jira experience with project management and story creation is a plus

Flexible working culture
Incentive programs
20 days PTO every year
Generous paid parental leave
Leading family support policies
Company-sponsored 401k match
Learning and wellness subscription stipend
Beautiful Union Square office with a casual dress code
Industry-leading, employer-sponsored insurance for you and your dependents, with several 100% Zip-covered choices available

Track Jobs with Teal

Job Search Resources

•

AI Resume Builder

•

Senior Site Reliability Engineer Resume Examples

•

Senior Site Reliability Engineer Cover Letter Examples

Senior Site Reliability Engineer

Job Search Resources

Tools

Career Hubs

Guides

Company