Our team, part of Apple Services Engineering, is building the scientific foundation for how AI systems are evaluated across Apple. We are seeking a Measurement Scientist to ensure that our evaluation methods are not just sophisticated, but scientifically valid and trustworthy . In this role, you will apply psychometric theory , validity frameworks, and statistical rigor to establish measurement standards for AI evaluation — ensuring that when we claim an evaluator measures "helpfulness" or "safety ," it actually does. We are looking for individuals across a range of experience levels. This role uniquely bridges measurement science and cutting-edge AI evaluation. You will develop methods for validating LLM-as-judge evaluators, automated benchmarks, and human evaluations. And you will create statistical tools that help engineers trust their evaluation results. You will work on an interdisciplinary team with ML researchers to solve new problems in AI evaluation. Your work will be both published at top measurement and ML venues and productionized into the evaluation SDK used across Apple. The successful candidate will have deep expertise in psychometrics and measurement theory , with the ability to apply these principles to novel AI evaluation challenges. You will work collaboratively with ML researchers, platform engineers, and evaluation practitioners to translate measurement science into practical tools that scale across the organization.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Education Level
Ph.D. or professional degree