Senior Applied AI Scientist, Clinical AI Agents

R37 Lab, R1 RCM•New York, NY

4d•Hybrid

About The Position

Phare Health is now part of R1 and its AI innovation engine, R37 Lab, bringing Phare’s frontier clinical reasoning technology together with one of the largest healthcare platforms in the U.S. At R37 and Phare, we are building the first AI-native Healthcare Revenue Operating System: a connected platform that reasons over full medical records, payer logic, and financial workflows to automate medical coding, billing, and follow-up. Our agentic AI systems already power production workflows across 95 of the top 100 U.S. health systems, processing hundreds of millions of patient encounters each year. This is startup-level ownership with enterprise-level impact. If you want to build AI that ships, scales, and measurably improves how healthcare works, this is the place to do it. We are looking for an Applied AI Engineer/Scientist to build, evaluate, and continuously improve clinical AI agents and supervised ML Models. You will work at the intersection of software engineering, LLM systems, evaluation, model improvement, and deep healthcare workflow understanding. Your job is to turn frontier model capability into reliable production behavior: agents that read complex medical records, use the right clinical and coding context, call the right tools, produce auditable outputs, and improve from real-world failures. You will be embedded in hard healthcare problems — clinical documentation integrity, medical coding, denial prevention, appeals, revenue cycle workflows, and payer logic — and will own the loop from problem framing to agent design, evaluation, deployment, trace analysis, and ongoing improvement. The ideal candidate is a strong engineer who thinks like an applied scientist: rigorous about measurement, comfortable with ambiguity, excited by messy real-world data, and motivated by closing the gap between impressive demos and dependable production systems.

Requirements

8+ years of software engineering, ML engineering, research engineering, or applied AI experience.
Highly proficient in Python and comfortable building production systems with APIs, structured data, async workflows, testing, logging, and observability.
Experience turning messy real-world workflows into structured AI problems, including classification, ranking, extraction, decisioning, LLM applications, agents, RAG, tool calling, structured outputs, prompting, or evaluation.
Have built or operated evaluation systems, benchmarks, annotation workflows, experiment tracking, or regression tests for AI systems.
Thrive in ambiguous, high-stakes domains: working with experts, debugging real-world failures, and turning model potential into reliable, correct, safe systems that work for users.

Nice To Haves

4+ years of software engineering, ML engineering, research engineering, or applied AI experience.
Experience turning messy real-world workflows into structured AI problems, including classification, ranking, extraction, decisioning, LLM applications, agents, RAG, tool calling, structured outputs, prompting, or evaluation.
Have built or operated evaluation systems, benchmarks, annotation workflows, experiment tracking, or regression tests for AI systems.
Thrive in ambiguous, high-stakes domains: working with experts, debugging real-world failures, and turning model potential into reliable, correct, safe systems that work for users.

Responsibilities

Design, build, and iterate on agentic AI systems for complex healthcare workflows, including documentation, coding, denial management, appeals, and revenue cycle automation.
Develop long-horizon agent behavior across context construction, retrieval, tool use, memory, routing, verification, escalation, and human-in-the-loop review.
Define what “good” looks like for clinical agents end-to-end, translating expert workflows into specifications, rubrics, gold standards, test cases, and clinically meaningful success criteria.
Build rigorous evaluation and feedback loops using expert review, production logs, model outputs, and benchmarks to measure performance, regressions, edge cases, safety, reliability, provenance quality, and business impact.
Prototype new AI capabilities from 0 → 1, then harden them into reliable, explainable, auditable production systems with clear contracts, monitoring, evidence, rationale, and performance gates.
Partner with research and ML engineering teams on model selection, fine-tuning, reward modeling, distillation, synthetic data, post-training, and internal AI infrastructure, including instrumentation, experiment tracking, benchmarking, prompt/version management, and reproducible evaluation.