We're looking for an Applied AI Evaluation Scientist — someone who sits at the intersection of data science, information retrieval, machine learning, and product thinking. This person will own the quality and trustworthiness of our AI/ML systems by designing, building, and running rigorous evaluation frameworks. The primary focus will be on our Agentic Retrieval-Augmented Generation (RAG) pipelines — optimizing how we chunk, embed, retrieve, rank, and generate — but the role extends to evaluating other AI/ML systems across the company. The ideal candidate has the judgment to know what's worth evaluating, what isn't, and the statistical grounding to make sure the evaluations they do run are sound, realistic, and actionable. Balancing resource capacity and velocity is key--knowing what to measure and how to measure it to drive improvements for our customers is paramount. You will work closely with Product and Engineering. Your code doesn't need to be production-hardened, but it must achieve intended outcomes — think research-quality Python, clear notebooks, and reproducible experiments, not bulletproof microservices.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level