Site Reliability Engineer

AlloyNew York, NY
$151,000 - $191,000Hybrid

About The Position

Alloy helps solve the identity risk problem for companies that offer financial products by enabling them to outpace fraud and confidently serve more people around the world. Over 800 of the world’s largest financial institutions and fintechs turn to Alloy to take control of fraud, credit, and compliance risk, and grow with the clearest picture of their customers. Through our values: Be Bold, Get Scrappy, Collaborate, and Celebrate Our Differences, we are creating a workplace where you can grow, thrive, and belong. See how we’ve been continuously recognized and named one of Inc. Magazine’s Best Workplaces, Forbes America’s Best Startup Employers, Best Fintech to Work for by American Banker, year after year. Check out our investors and read more about us here. About the team Alloy’s Infrastructure Team is a small team (6 engineers) responsible for a large and growing infrastructure footprint: 15+ Kubernetes clusters, 100+ databases, dozens of services, and complex data organization. Our challenge isn’t just scale—it’s making that scale reliable, secure, and operable with less manual work. We’re looking for engineers who enjoy turning complex, fragile systems into automated, self-service platforms with strong safety guarantees. What you'll be doing Reporting to the Engineering Manager of Infrastructure, you'll:

Requirements

  • 5+ years of experience in infrastructure, SRE, or software engineering roles
  • Strong software engineering skills—you build systems, not just scripts
  • Experience managing production infrastructure at scale (cloud + containerized systems)
  • Experience with Infrastructure as Code (e.g., Terraform)
  • Experience running and troubleshooting distributed systems (Docker/Kubernetes)
  • Experience with observability and debugging tools (Datadog, CloudWatch, ELK/EFK, etc.)
  • Proficiency in at least one programming language (Python, Go, JavaScript, etc.)
  • Experience participating in on-call rotations and improving systems based on incidents
  • Strong communication and collaboration skills

Nice To Haves

  • Experience running Kubernetes in production at scale
  • Deep familiarity with AWS
  • Experience building internal platforms or developer tooling
  • Background in distributed systems or large-scale data systems

Responsibilities

  • Design and build systems to automate infrastructure management at scale (provisioning, upgrades, migrations)
  • Reduce operational toil by turning manual processes into reliable, repeatable workflows
  • Build internal tooling and platforms that enable safe self-service changes for other engineers
  • Improve the reliability and resilience of our infrastructure (Kubernetes, databases, services)
  • Implement and evolve systems for deploying and running applications in Kubernetes
  • Contribute to architecture decisions across infrastructure, reliability, and security
  • Write and review production-quality code
  • Participate in on-call rotations—but focus on building systems that prevent incidents, not just respond to them

Benefits

  • Unlimited PTO and flexible work policy
  • Employee stock options
  • Medical, dental, vision plans with HSA (monthly employer contribution) and FSA options
  • 401k with 100% match up to 4% of annual employee compensation
  • Eligible new parents receive 16 weeks of paid parental leave
  • Home office stipend for new employees
  • Annual Learning & Development annual stipend
  • Well-being benefits include access to ClassPass, OneMedical, UrbanSitter, and Spring Health
  • Hybrid work environment: employees are expected to work Tuesdays through Thursdays from our HQ in Union Square, Manhattan. Tasty lunches catered from a variety of local restaurants and frequent employee-organized cultural events contribute to our positive office energy. On Monday/Friday most employees Zoom into work from home while some take advantage of the quieter office.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service