Site Reliability Engineer

MercorSan Francisco, CA
4dOnsite

About The Position

As a Site Reliability Engineer (SRE) at Mercor, you’ll own production reliability across our most critical systems, partnering directly with infrastructure leadership. You’ll play a foundational role in building our SRE function from the ground up and shaping how Mercor operates large-scale, high-availability systems.

Requirements

  • Experience doing true SRE work (not just operations) across multiple roles or companies.
  • Deep familiarity with SRE practices as popularized by Google (e.g., error budgets, reliability vs. risk trade-offs, large-scale distributed systems).
  • 5+ years of SRE experience; 15+ years of overall experience is ideal for this first SRE hire.
  • Proven success operating systems at scale, with a strong understanding of the challenges of large, distributed production environments.
  • Strong collaboration skills; able to work efficiently with cross-functional engineering teams.
  • Ability to drive cultural change around reliability while remaining hands-on in building and fixing systems.
  • Comfort working in high-intensity, high-availability environments where uptime and production quality are critical.

Nice To Haves

  • Experience as a founding SRE or early SRE hire, standing up SRE practices and orgs from scratch.
  • Hands-on experience in the AWS ecosystem, Kubernetes, and modern IaC tooling (Terraform, Spacelift, etc.).

Responsibilities

  • Own reliability and production safety for core shared services and customer-facing systems.
  • Partner directly with infrastructure leadership to define SRE priorities, reliability standards, and production safety roadmap.
  • Repair and improve how our production systems are structured so they are stable, resource-efficient, isolated, and well-observed.
  • Introduce and champion modern SRE practices (e.g., incident response, postmortems, SLIs/SLOs) across engineering teams.
  • Collaborate with leverage engineering and applied AI teams to ensure sustainable growth.
  • Represent SRE best practices internally and help teams onboard onto production in a way that is safe, scalable, and consistent with SRE principles.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service