As an AI Engineer - Evaluations at Hippocratic AI, you'll define and build the systems that measure, validate, and improve the intelligence, safety, and empathy of our voice-based generative healthcare agents. Evaluation sits at the heart of our model improvement loop - it informs architecture choices, training priorities, and launch decisions for every patient-facing agent. You'll design LLM-based auto-evaluators, agent harnesses, and feedback pipelines that ensure each model interaction is clinically safe, contextually aware, and grounded in healthcare best practices. You'll collaborate closely with research, product, and clinical teams, working across the stack - from backend data pipelines and evaluation frameworks to tooling that surfaces insights for model iteration. Your work will directly shape how our agents behave, accelerating both their reliability and their real-world impact.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
No Education Listed
Number of Employees
101-250 employees