Site Reliability Engineer

Alloy•New York, NY

79d•$151,000 - $191,000•Hybrid

About The Position

Alloy helps solve the identity risk problem for companies that offer financial products by enabling them to outpace fraud and confidently serve more people around the world. Over 800 of the world’s largest financial institutions and fintechs turn to Alloy to take control of fraud, credit, and compliance risk, and grow with the clearest picture of their customers. Through our values: Be Bold, Get Scrappy, Collaborate, and Celebrate Our Differences, we are creating a workplace where you can grow, thrive, and belong. See how we’ve been continuously recognized and named one of Inc. Magazine’s Best Workplaces, Forbes America’s Best Startup Employers, Best Fintech to Work for by American Banker, year after year. Check out our investors and read more about us here. About the team Alloy’s Infrastructure Team is a small team (6 engineers) responsible for a large and growing infrastructure footprint: 15+ Kubernetes clusters, 100+ databases, dozens of services, and complex data organization. Our challenge isn’t just scale—it’s making that scale reliable, secure, and operable with less manual work. We’re looking for engineers who enjoy turning complex, fragile systems into automated, self-service platforms with strong safety guarantees. What you'll be doing Reporting to the Engineering Manager of Infrastructure, you'll:

Requirements

5+ years of experience in infrastructure, SRE, or software engineering roles
Strong software engineering skills—you build systems, not just scripts
Experience managing production infrastructure at scale (cloud + containerized systems)
Experience with Infrastructure as Code (e.g., Terraform)
Experience running and troubleshooting distributed systems (Docker/Kubernetes)
Experience with observability and debugging tools (Datadog, CloudWatch, ELK/EFK, etc.)
Proficiency in at least one programming language (Python, Go, JavaScript, etc.)
Experience participating in on-call rotations and improving systems based on incidents
Strong communication and collaboration skills

Nice To Haves

Experience running Kubernetes in production at scale
Deep familiarity with AWS
Experience building internal platforms or developer tooling
Background in distributed systems or large-scale data systems

Responsibilities

Design and build systems to automate infrastructure management at scale (provisioning, upgrades, migrations)
Reduce operational toil by turning manual processes into reliable, repeatable workflows
Build internal tooling and platforms that enable safe self-service changes for other engineers
Improve the reliability and resilience of our infrastructure (Kubernetes, databases, services)
Implement and evolve systems for deploying and running applications in Kubernetes
Contribute to architecture decisions across infrastructure, reliability, and security
Write and review production-quality code
Participate in on-call rotations—but focus on building systems that prevent incidents, not just respond to them

Benefits

Unlimited PTO and flexible work policy
Employee stock options
Medical, dental, vision plans with HSA (monthly employer contribution) and FSA options
401k with 100% match up to 4% of annual employee compensation
Eligible new parents receive 16 weeks of paid parental leave
Home office stipend for new employees
Annual Learning & Development annual stipend
Well-being benefits include access to ClassPass, OneMedical, UrbanSitter, and Spring Health
Hybrid work environment: employees are expected to work Tuesdays through Thursdays from our HQ in Union Square, Manhattan. Tasty lunches catered from a variety of local restaurants and frequent employee-organized cultural events contribute to our positive office energy. On Monday/Friday most employees Zoom into work from home while some take advantage of the quieter office.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume