Senior AI Safety Clinical Reviewer (TPM)

mpathicSeattle, WA
$140,000 - $190,000Remote

About The Position

mpathic is building the future of empathetic, trustworthy AI. Grounded in behavioral science and human-centered design, our technology delivers AI systems that are safe, aligned, and emotionally intelligent. As enterprises race to adopt AI, we believe the companies that win will be those that build trust first. We are building a high-quality AI Safety Team to evaluate and strengthen advanced AI systems. Our work focuses on making models reliable, auditable, and scalable—so safety work can move fast without relying on heroics or sacrificing quality. We’re looking for a Clinical Reviewer & Trainer who is a licensed or clinically trained expert, with deep experience in psychological safety, qualitative evaluation, and clinical supervision or training. This role owns clinical QA systems, not just participates in them, making final calls on clinical quality within defined guardrails. The ideal hire will manage, mentor, and set standards for clinicians who are completing diverse tasks that range from data generation, benchmarking, evaluation, developing safety rubrics and red teaming. This role is ~50% clinical QA and adjudication, ~50% expert training and calibration. This role does not provide therapy, crisis intervention, or on-call clinical care. In this role, you will serve as the clinical quality backbone of our evaluation programs—ensuring that expert raters apply rubrics consistently, edge cases are adjudicated rigorously, and our outputs reflect high-quality clinical judgment in emotionally complex AI interactions. You’ll operate at the intersection of: Clinical judgment and supervision AI safety evaluation rigor Expert training and calibration Rubric interpretation and edge-case adjudication Qualitative data review and synthesis Scalable QA systems

Requirements

  • 5+ years of experience in one or more of: Clinical supervision, training, or peer review; Clinical research operations or qualitative review; Trust & safety, human-centered AI, or behavioral science workflows; AI evaluation, annotation QA, or rubric-based assessment systems; Coaching clinicians or experts to apply shared standards in high-stakes domains
  • Licensed clinician (LCSW, LMFT, PsyD, PhD, MD) or equivalent applied experience
  • Deep familiarity with psychological stress, trauma, crisis response, or mental health frameworks
  • Expertise evaluating emotionally complex conversations for safety, appropriateness, and harm risk
  • Strong judgment in ambiguous edge cases involving distress, vulnerability, or sensitive disclosures
  • Synthesizing qualitative review data across large datasets to identify recurring patterns, failure modes, and sources of rater disagreement, and translating those insights into improved rubrics, training materials, and QA workflows
  • Training experts to apply nuanced rubrics consistently
  • Running calibration sessions and reducing disagreement across raters
  • Making and documenting difficult adjudication calls with rigor
  • Maintaining clinical quality without slowing execution
  • Updating guidance as edge cases and model behaviors evolve
  • Communicating clearly through structured feedback, examples, and reviewer notes
  • Helping other clinicians apply shared evaluation standards

Nice To Haves

  • Experience with AI safety evaluations
  • Experience with AI evaluation, annotation QA, or rubric-based assessment systems
  • Experience coaching clinicians or experts to apply shared standards in high-stakes domains

Responsibilities

  • Establish and evolve a reliable clinical QA and adjudication workflow across evaluation projects, identifying gaps and improving on systems
  • Train and onboard expert raters on safety rubrics, shared standards, and evaluation philosophy
  • Run calibration sessions to reduce disagreement and improve consistency
  • Serve as the escalation point for ambiguous or high-risk conversations
  • Deliver adjudicated, high-confidence evaluation datasets for an initial pilot (e.g., stressful life events conversations)
  • Demonstrate calibrated clinical judgment aligned with mpathic’s safety philosophy
  • Design workflows that others can run without constant oversight
  • Own clinical quality systems across multiple concurrent AI safety evaluation programs
  • Develop scalable training, certification, and feedback loops for expert evaluators
  • Continuously improve inter-rater reliability, rubric clarity, and reviewer consistency
  • Build gold-standard examples and clinical playbooks for sensitive user contexts
  • Identify recurring model failure patterns and inform rubric/tooling improvements, delivering outputs that can withstand customer, partner, and/or audit scrutiny
  • Help shape mpathic’s long-term approach to emotionally grounded AI safety QA
  • Balance clinical nuance, operational clarity, and customer needs in real-world evaluation delivery
  • Review expert ratings for clinical appropriateness, consistency, and safety alignment
  • Serve as the escalation point for edge cases, ambiguous conversations, or disagreement clusters
  • Make final adjudication decisions and document rationale clearly
  • Ensure evaluation outputs are rigorous, clinically grounded, and customer-ready
  • Evaluate safety, appropriateness, and response quality—not clinical outcomes
  • Maintain responsible scope and handling of sensitive psychological content
  • Onboard and certify new expert raters on mpathic rubrics and evaluation philosophy
  • Develop training materials, examples, and calibration exercises
  • Coach clinicians and evaluators to engage in high-level qualitative analysis
  • Run ongoing learning loops to prevent rater drift over time
  • Implement review tiers: peer review → clinical QA → escalation/adjudication
  • Run inter-rater reliability measurement and disagreement reduction workflows
  • Maintain gold sets, anchor examples, and severity calibration standards
  • Identify patterns of confusion and update rubric guidance accordingly
  • Partner with behavioral science experts to refine rating criteria and edge-case handling
  • Surface recurring rubric gaps or unclear definitions
  • Ensure rubrics balance clinical nuance with operational usability
  • Work closely with TPMs and Evaluation Leads — delivery execution, workflows, escalation systems
  • Work closely with Clinical & Behavioral Science Experts — rubric grounding, psychological frameworks
  • Work closely with QA Leadership — agreement metrics, gold sets, drift monitoring
  • Work closely with Engineering / Product — tooling support for review, audit trails, and escalation queues
  • Work closely with Customer Delivery — ensuring findings are interpretable and trustworthy

Benefits

  • 100% company-funded health, dental, and vision insurance for full-time employees
  • 401k
  • Well-being programs
  • Flexible paid-time off
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service