AI Engineer

Roger Healthcare•San Francisco, CA

18h•$200,000 - $250,000•Hybrid

About The Position

Roger is an AI platform that frees home health clinicians from paperwork so they can focus on what matters: delivering life-changing care to our most vulnerable elderly patients in the comfort of their homes. Backed by leading healthcare investors like SignalFire, we've powered millions of visits, deployed to thousands of clinicians, and we're just getting started. The administrative work behind home health is messy, unstructured, and high-stakes. From clinical documentation to the operational busywork that organizations serving these vulnerable populations face every day, turning dense, nuanced information into accurate records requires LLM systems that extract structure from unstructured data, validate their own outputs, learn from a vast and growing dataset, and continuously improve in accuracy. We are now looking for an AI Engineer to build the intelligence layer at the core of Roger: training and fine-tuning models on our data, building the eval and monitoring infrastructure that keeps them accurate, and shipping LLM-powered workflows that clinicians rely on every day. There is a wide gap between a demo of an AI product and one that actually works for real clinicians caring for real patients, and we are hiring engineers who build on the right side of that gap.

Requirements

7+ years of professional software engineering experience, with meaningful depth in AI/ML.
Experience training, fine-tuning, or evaluating LLMs and open source models, with real opinions about what works and what does not.
Shipped real AI software to real users. You can describe something you built, what broke, and how you fixed it.
Strong instincts for evals, observability, and the feedback loops that turn user feedback into measurable improvement.
Experience building agentic systems: tool use, generator and critic loops, planners and executors, and orchestration where one agent's output drives another's work.
Comfort building infrastructure that is fast, reliable, and cost-efficient at scale.
Startup experience shipping real features in high-growth environments.
A product mindset, comfort across the stack, and the ability to operate in ambiguity without a clean spec.
High standards for reliability and accuracy when real clinicians and patients depend on your work.

Responsibilities

Train and fine-tune open source models, leveraging our vast proprietary dataset to push accuracy beyond what off-the-shelf models can do.
Build eval datasets and pipelines that let us measure model accuracy rigorously and improve it continuously.
Design how we measure accuracy in the first place: the metrics, harnesses, and feedback loops that turn real clinical outcomes into measurable model improvements.
Build scalable, cost-efficient inference infrastructure with great monitoring and observability.
Build better agentic infrastructure and partner on the interfaces that turn model capability into a great clinician experience.
Stay at the frontier: keep up with the latest research, frontier model capabilities, and open source frameworks, and bring the best of it into production.
Prototype quickly, then harden into scalable, secure, and reliable production systems.