Applied AI Scientist, Clinical AI Agents

R37 Lab, R1 RCM•New York, NY

5d•Hybrid

About The Position

Phare Health is now part of R1 and its AI innovation engine, R37 Lab, bringing Phare’s frontier clinical reasoning technology together with one of the largest healthcare platforms in the U.S. At R37 and Phare, we are building the first AI-native Healthcare Revenue Operating System: a connected platform that reasons over full medical records, payer logic, and financial workflows to automate medical coding, billing, and follow-up. Backed by real customers, real data, and real distribution, we operate on a national scale. Our agentic AI systems already power production workflows across 95 of the top 100 U.S. health systems, processing hundreds of millions of patient encounters each year. This is startup-level ownership with enterprise-level impact. If you want to build AI that ships, scales, and measurably improves how healthcare works, this is the place to do it.

Requirements

4+ years of software engineering, ML engineering, research engineering, or applied AI experience.
Highly proficient in Python and comfortable building production systems with APIs, structured data, async workflows, testing, logging, and observability.
Experience turning messy real-world workflows into structured AI problems, including classification, ranking, extraction, decisioning, LLM applications, agents, RAG, tool calling, structured outputs, prompting, or evaluation.
Have built or operated evaluation systems, benchmarks, annotation workflows, experiment tracking, or regression tests for AI systems.
Thrive in ambiguous, high-stakes domains: working with experts, debugging real-world failures, and turning model potential into reliable, correct, safe systems that work for users.

Responsibilities

Design, build, and iterate on agentic AI systems for complex healthcare workflows, including documentation, coding, denial management, appeals, and revenue cycle automation.
Develop long-horizon agent behavior across context construction, retrieval, tool use, memory, routing, verification, escalation, and human-in-the-loop review.
Define what “good” looks like for clinical agents end-to-end, translating expert workflows into specifications, rubrics, gold standards, test cases, and clinically meaningful success criteria.
Build rigorous evaluation and feedback loops using expert review, production logs, model outputs, and benchmarks to measure performance, regressions, edge cases, safety, reliability, provenance quality, and business impact.
Prototype new AI capabilities from 0 → 1, then harden them into reliable, explainable, auditable production systems with clear contracts, monitoring, evidence, rationale, and performance gates.
Partner with research and ML engineering teams on model selection, fine-tuning, reward modeling, distillation, synthetic data, post-training, and internal AI infrastructure, including instrumentation, experiment tracking, benchmarking, prompt/version management, and reproducible evaluation.