Software Engineer - MLOps

Phare Health and R1 RCM

58d•$140,000 - $300,000•Hybrid

About The Position

Phare Health is now part of R1 and its AI innovation engine, R37 Lab, bringing Phare’s frontier clinical reasoning technology together with one of the largest healthcare platforms in the U.S. At R37 and Phare, we are building the first AI-native Healthcare Revenue Operating System: a connected platform that reasons over full medical records, payer logic, and financial workflows to automate medical coding, billing, and follow-up. Backed by real customers, real data, and real distribution, we operate on a national scale. Our agentic AI systems already power production workflows across 95 of the top 100 U.S. health systems, processing hundreds of millions of patient encounters each year, including: 180M+ Claims 550M+ Patient encounters 1.2B+ Workflow actions and outcomes each year This is startup-level ownership with enterprise-level impact. If you want to build AI that ships, scales, and measurably improves how healthcare works, this is the place to do it. You’ll own the production runtime for Phare’s ML stack - deploying, serving, and scaling models across inference endpoints and batch/streaming workflows. You’ll build progressive delivery pipelines with automated rollouts and rollbacks, manage SLOs for latency and availability, and instrument end-to-end observability (metrics, logs, traces, drift, regression). You’ll harden the platform with Terraform, Kubernetes, and CI/CD, ensuring reproducible, auditable ML releases. We are hiring across several seniority levels ranging from Mid-level up to Staff. At a minimum, we expect 5 years of software engineering experience with 2 years of ML Ops experience. This is an in-person role in NYC, requiring at least 3 days in the SoHo office.

Requirements

5 years of software engineering experience
2 years of ML Ops experience
Production ML: You’ve deployed and operated models running on GPUs in production - APIs and batch/streaming inference
Platform engineering: Strong with Docker/Kubernetes, IaaC (e.g., Terraform), and CI/CD for services and model artifacts; you maintain environment parity, reproducible releases, and robust model/experiment versioning with data lineage
System Reliability: You use progressive delivery with automated rollouts/rollbacks, and you build end-to-end observability (metrics, logs, traces, and model telemetry for drift/regression) plus actionable alerting, runbooks, and incident response
Post-training lifecycles: You manage model registries and stage gates, design scheduled or event-driven retraining when appropriate, and enforce RBAC, secrets management, encryption, and audit logs

Nice To Haves

Experience in regulated environments (e.g., healthcare, finance)

Responsibilities

Deploying, serving, and scaling models across inference endpoints and batch/streaming workflows
Build progressive delivery pipelines with automated rollouts and rollbacks
Manage SLOs for latency and availability
Instrument end-to-end observability (metrics, logs, traces, drift, regression)
Harden the platform with Terraform, Kubernetes, and CI/CD, ensuring reproducible, auditable ML releases