Do you get excited by assessing LLM applications’ quality and driving the adoption of these applications? Our Evaluation organization is responsible for providing principled assessments across a diverse range of Apple features, from Search, Siri to the latest Apple Intelligence capabilities. Our team specializes in building LLM-as-judge(i.e. autograder) and related tooling to improve both the quality and efficiency of these evaluations. We are seeking a principal Data Scientist to own the end-to-end quality analysis of these autograders — from defining rigorous validation frameworks to driving adoption across feature teams. This is a high-impact, high-visibility role at the intersection of data science, AI evaluation, and product quality.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Education Level
Ph.D. or professional degree
Number of Employees
5,001-10,000 employees