Site Reliability Engineer (SRE)

Sonio

13h•Remote

About The Position

Sonio is seeking its first Site Reliability Engineer (SRE) and first engineer in the US. This role will own the platform’s stability and releases, particularly during PST hours. The ideal candidate is a hybrid of a system administrator and a software engineer, capable of managing infrastructure and understanding the code running on it. This position offers high autonomy, requiring critical decision-making during incidents and ensuring the production environment is state-of-the-art, secure, and resilient. The SRE will report to the Lead DevOps Engineer and will be responsible for US coverage for releases and incidents as the first responder during PST hours. This role involves bridging infrastructure and code by collaborating with the DevOps team on Kubernetes, Terraform, and AWS, with the ability to read and patch Elixir code. The SRE will drive incident response end-to-end, including triage, mitigation, and blameless post-mortems. Key responsibilities include improving platform operability by defining SLOs, tuning alerts, and enhancing observability (metrics, logs, tracing). The role also involves transferring operational knowledge from France to the US by creating runbooks and documenting procedures. Additionally, the SRE will support compliance and security in a regulated medical-device environment, maintaining HIPAA-aligned controls and an audit-ready infrastructure.

Requirements

4+ years of experience in SRE, DevOps, or Production Engineering, including significant on-call experience on a 24/7 product
Hybrid "code-literate" mindset, acting as an infrastructure expert who can also navigate a backend codebase to triage and patch issues independently.
Strong technical foundations in Kubernetes, Terraform, and AWS.
Ability to architect and tune your own observability signals.
Highly autonomous and comfortable making technical decisions with limited supervision.
Operational rigor and ability to stay calm under pressure.
Written English skills necessary to produce high-quality runbooks and handle async handoffs.
Ability to cover for PST timezone.

Nice To Haves

Interest in Sonio's mission.

Responsibilities

Own US coverage for releases and incidents as the first responder during PST hours.
Bridge infra and code by working hand-in-hand with our DevOps team on Kubernetes, Terraform, and AWS, while being able to read and patch Elixir code to unblock yourself without waiting for a backend engineer.
Drive incident response end-to-end, managing triage, mitigation, and blameless post-mortems with real follow-through.
Improve the platform’s operability by defining SLOs, tuning alerts to reduce toil, and pushing observability (metrics, logs, tracing) where it’s lacking.
Transfer operational knowledge from France to the US by authoring runbooks and documenting procedures so local teams are empowered to act when something breaks.
Support compliance and security in our regulated medical-device environment, maintaining HIPAA-aligned controls and an audit-ready infrastructure.

Benefits

Health Insurance (Medical plan, vision, dental) - up to $30,000 per year
FSA & HSA
401(k) - up to 4% of your salary matched
Life Insurance - covering 2 times your salary, up to $200k
Attractive Parental Policy for primary and secondary caregivers
20 PTO + 1 week offered between Christmas and New Year
Offices in Boston (HQ) & New York (incl. free breakfast, drinks & gym)
Flexible hours & remote policies
Commuter Benefits
One offsite per year in France & regular team building with US team
Ongoing trainings and continuous opportunities for professional growth and development, specifically unlimited access to coaching