Product Reliability Engineer

PointOne•New York City, NY

7d•Onsite

About The Position

PointOne builds infrastructure for the legal industry, powering timekeeping and billing systems used by law firms and government agencies. We’re a venture‑backed startup (Y Combinator, Bessemer, 8VC, General Catalyst) made up of software engineers (Jane Street, Google, Stanford, Princeton) and ex-attorneys. To keep up with inbound customer demand, we are quickly scaling our engineering team following a $16M Series A. We process the most confidential data for institutions working on the most sensitive matters—and our customers depend on us being up, accurate, and fast, always. We're hiring a Product Reliability Engineer to own the health, stability, and observability of our systems end-to-end. The Role You are the connective tissue between our customers and our product. You’re a product engineer focused on a key dimension of the user experience: reliability. When reliability is threatened, you'll work directly with customers to understand impact, stop the bleeding, create a robust fix, and then close the loop. However, this role isn't only reactive. The best PREs use front-line signals to make the whole system more resilient: better observability, fewer recurring failures, and proactive investments that get ahead of problems before customers feel them. This is a hands-on, full-stack engineering role at the intersection of product, infrastructure, and customer impact.

Requirements

2+ years of software engineering experience, with meaningful time spent in reliability, platform, or production-facing roles
Strong debugging instincts and comfort tracing failures across distributed systems using logs, traces, and metrics
Hands-on experience with AWS (Lambda, SQS, RDS, CloudWatch or equivalent)
Comfortable reading and writing Go, TypeScript, or similar backend languages
Experience building or improving observability infrastructure (alerting, dashboards, telemetry)
High ownership mentality: you close the loop, you write the postmortem, you ship the fix

Nice To Haves

Experience in legaltech, fintech, healthtech, or other high-sensitivity, always-on environments.

Responsibilities

Respond quickly to automated alerts and customer-reported issues
Triage, diagnose, and resolve production incidents with a bias toward permanent fixes over workarounds
Build and maintain incident response playbooks and postmortem processes
Coordinate cross-functionally with customer success managers and key account stakeholders to maintain customer trust in the event of an incident
Design and instrument telemetry, logging, and alerting across our serverless AWS stack
Build dashboards and health metrics that surface issues before customers feel them
Identify recurring failure patterns and drive systemic fixes into the codebase
Reduce operational toil through automation
Contribute directly to the codebase—improving resilience, reducing tech debt, and creating automation to ensure bugs are resolved quickly and with little human intervention
Partner with engineers on new feature launches to assess reliability risks before they ship
Make data-driven recommendations on where to invest in stability