About The Position

StarCompliance is on a mission to make compliance simple and easy. Trusted globally by enterprise financial institutions, the user-friendly STAR platform empowers organizations to achieve regulatory compliance while safeguarding their integrity and business reputations. Through a customizable, 360-degree view of employee activity, the STAR software enables firms to automate the detection and resolution of potential areas of conflict while streamlining daily workflows and increasing efficiency. At StarCompliance, we operate a distributed, multi-tenant SaaS platform supporting critical compliance needs for global clients. The ability to safely and repeatedly promote change into production—across our entire platform estate—is central to our success. We are seeking a Senior / Lead Site Reliability Engineer to provide technical leadership and custodianship of our production promotion, observability, and reliability practices. This associate director–level role reports into the Head of Platform Engineering and is suited to an experienced SRE leader who combines strong operational judgement with solid software engineering experience and a deeply hands-on approach. The role blends strategic responsibility with hands-on execution. You will work closely with Technical Leads and QA to shape how systems are promoted through environments, ensuring that quality, performance, and operational readiness are built into the promotion process. You will remain actively involved in day-to-day engineering, on-call and rota support, and continuous improvement of production systems, while contributing to the wider platform strategy. How you work matters. We value leaders who take ownership of outcomes, communicate clearly under pressure, and raise the operational maturity of the organisation through action as well as direction.

Requirements

  • Strong hands-on experience operating distributed, cloud-hosted SaaS platforms at scale.
  • Professional experience with at least one modern programming language.
  • Strong experience with Microsoft Azure, including core platform services, networking, identity, and security.
  • Deep expertise in observability tooling and practices. Experience improving production promotion, deployment, and release processes.
  • Experience with Infrastructure as Code and automation-driven operations.
  • Strong understanding of failure modes, resilience patterns, and recovery strategies. Ability to influence senior stakeholders through technical credibility and pragmatism.
  • Based In East Coast Time Zone
  • Typically, 8+ years of experience in SRE, platform, operational, or software engineering roles with a large amount of these spent in multi-tenant environments.
  • Experience supporting production systems with formal on-call or rota responsibility.
  • Experience in leading and mentoring a team of SRE engineers, with an emphasis on professional and personal growth.
  • Experience enabling regular, multi-service production releases at scale.
  • Right to work in the country of employment.

Nice To Haves

  • Experience working with or supporting .NET-based systems is highly beneficial.

Responsibilities

  • Act as a senior custodian of the production promotion process across the software platform estate.
  • Work closely with Technical Leads and QA to define and evolve promotion practices that emphasise quality, performance, and operational readiness.
  • Define and evolve observability standards across metrics, logging, tracing, and alerting.
  • Ensure systems are instrumented to support rapid diagnosis, learning, and recovery.
  • Drive continuous improvement in platform reliability, performance, and release confidence.
  • Partner with engineering, architecture, and platform teams to embed operability and resilience into system design.
  • Lead and participate in on-call and rota-based operational support for production systems.
  • Coordinate and continuously improve incident management practices, including post-incident reviews and preventative actions.
  • Act as a senior technical authority for production readiness, operational risk, and release confidence.
  • Mentor SREs and senior engineers, raising reliability and operational standards across teams.
  • Influence architectural and platform decisions with a strong operational and delivery lens while remaining hands-on.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service