About The Position

About Us At Sully.ai, We’re Building the Most Impactful Healthcare Company on Earth We believe that access to a great doctor is a basic human right. Today, that’s not a reality. Delays, misdiagnoses, administrative chaos, and burnout plague the system. Our Mission: One Human, One Doctor. We build AI teammates that augment clinicians — scribes, nurses, receptionists, translators — all powered by our own world-class models and deployed in real-world care. Our Traction 450+ organizations signed 16 months AI agents cut admin by ~2.8 hours daily and reduce onboarding 85%. 5M+ Clinical Tasks completed to date, serving 36+ specialties. Raised $25M from YC, Eric Yuan, Amity, Semper Virens Patented AI architecture (MedCon-1) outperforms GPT-4.5, Gemini, Claude on clinical reasoning tasks Sully requires A-players capable of 4 months = 1 year output. If you’ve ever said, “I want to do work that actually matters”, this is it. Let’s build something life-changing, together. Why Join Sully.ai? 🔥 Revolutionizing the antiquated $800B+ Healthcare market 🧠 50%+ of us are ex-founders. We hire A-players, not passengers ⚡️ Speed matters - we operate with urgency, autonomy, and ownership 🧪 You’ll work on real, first-of-their-kind problems at the edge of AI and medicine ❤️ Your work helps doctors reclaim their time - and patients get better, faster care Sully.ai is an equal opportunity employer. In addition to EEO being the law, it is a policy that is fully consistent with our principles. All qualified applicants will receive consideration for employment without regard to status as a protected veteran or a qualified individual with a disability, or other protected status such as race, religion, color, national origin, sex, sexual orientation, gender identity, genetic information, pregnancy or age. Sully.ai prohibits any form of workplace harassment.

Requirements

  • Proven experience designing agentic processes and LLM evaluation/benchmarking frameworks.
  • Strong Python and ML background (PyTorch/TensorFlow, Hugging Face, LangChain/LlamaIndex).
  • Demonstrated ability to design rigorous experiments and translate findings into production.
  • Track record of published research or deep applied work in LLMs and agent evaluation.
  • Strong communication and technical writing skills to articulate complex findings clearly.

Responsibilities

  • Build and scale automated evaluation pipelines (LLM-as-judge + human review) with clinical-grade benchmarks.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service