AIML - Sr Applied AI Scientist - GenAI Model Autograding, Evaluation

Apple•Cupertino, CA

51d

About The Position

Do you get excited by building AI applications to enhance the evaluation of various Apple AI products? Our Evaluation organization is responsible for providing principled assessments across a diverse range of Apple features, from Search and Siri to the latest Apple Intelligence capabilities. Within this critical function, our team specializes in leveraging advanced AI/ML techniques to enhance both the quality and efficiency of these comprehensive evaluations. We are seeking a highly innovative and passionate Applied AI Scientist to develop cutting-edge AI/ML models for the automatic grading and quality assessment of our internal GenAI products. DESCRIPTION In this pivotal role, you will design and advance state-of-the-art autograder systems that evaluate various AI product quality at scale. You will apply deep expertise in prompt engineering, foundation model adaptation, and evaluation methodology to build robust, trustworthy, and extensible autograders that assess product performance, user experience, and adherence to quality and safety standards. Then you will collaborate with product, annotation, evaluation data scientists, autograder tooling engineers to deploy the state-of-the-art autograders, directly impacting the quality and success of Apple’s next-generation AI-powered features.

Requirements

Extensive experience with prompting techniques.
Deep understanding of GenAI models and 1+ year of industry experience in building or evaluation GenAI models.
Familiarity with LLMOps processes for deploying, monitoring and hillclimbing AI models in production environments.
Excellent analytical skills and judgement, capable of assessing data quality, diagnosing autograder limitations or biases, synthesizing findings into actionable insights, and communicating them clearly across teams.
Ownership mindset with the flexibility to take on whatever tasks—including annotation or operational work—are necessary to deliver results.

Nice To Haves

Experience in developing AI models specifically for quality assessment or automated feedback generation.
Familiarity with human annotation operations, sampling strategies, or subjective judgment evaluation.
Experience in designing human-in-the-loop evaluation workflows.
Familiar with image quality evaluation.
Demonstrated passion for leveraging AI to improve work efficiency and scale.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume