Site Reliability Engineer (SRE)

Sonio•Boston, MA

10d•$165,000 - $190,000•Hybrid

About The Position

Sonio is seeking its first Site Reliability Engineer (SRE) and first engineer in the US to own the platform’s stability and releases, particularly during PST hours. This role requires a hybrid profile, combining system administration and software engineering skills. The SRE will operate with high autonomy, making critical decisions during incidents and ensuring the production environment is state-of-the-art, secure, and resilient. The position reports to the Lead DevOps Engineer and involves bridging infrastructure and code by working with Kubernetes, Terraform, and AWS, with the ability to read and patch Elixir code. Key responsibilities include driving incident response end-to-end, improving platform operability through SLO definition and alert tuning, enhancing observability, transferring operational knowledge from France to the US via runbooks and documentation, and supporting compliance and security in a regulated medical-device environment with HIPAA-aligned controls.

Requirements

4+ years of experience in SRE, DevOps, or Production Engineering, including significant on-call experience on a 24/7 product
Hybrid "code-literate" mindset, acting as an infrastructure expert who can also navigate a backend codebase to triage and patch issues independently.
Strong technical foundations in Kubernetes, Terraform, and AWS, along with the ability to architect and tune your own observability signals.
Highly autonomous and comfortable making technical decisions with limited supervision.
Operational rigor and ability to stay calm under pressure.
Written English skills necessary to produce high-quality runbooks and handle async handoffs.
Interest in Sonio's mission.

Responsibilities

Own US coverage for releases and incidents as the first responder during PST hours.
Bridge infra and code by working hand-in-hand with our DevOps team on Kubernetes, Terraform, and AWS, while being able to read and patch Elixir code to unblock yourself without waiting for a backend engineer.
Drive incident response end-to-end, managing triage, mitigation, and blameless post-mortems with real follow-through.
Improve the platform’s operability by defining SLOs, tuning alerts to reduce toil, and pushing observability (metrics, logs, tracing) where it’s lacking.
Transfer operational knowledge from France to the US by authoring runbooks and documenting procedures so local teams are empowered to act when something breaks.
Support compliance and security in our regulated medical-device environment, maintaining HIPAA-aligned controls and an audit-ready infrastructure.

Benefits

Health Insurance (Medical plan, vision, dental) - up to 30,000$ per year + FSA & HSA
401(k) - up to 4% of your salary matched
Life Insurance - covering 2 times your salary, up to $200k
An attractive Parental Policy for primary and secondary caregivers
20 PTO + 1 week offered between Christmas and New Year
Offices in Boston (HQ) & New York (incl. free breakfast, drinks & gym)
Flexible hours & remote policies
Commuter Benefits
One offsite per year in France & regular team building with US team
Ongoing trainings and continuous opportunities for professional growth and development, specifically unlimited access to coaching