This role focuses on developing, carrying-out, interpreting, and communicating pre- and post-ship evaluations of the safety of Apple Intelligence features. Both human grading and model-based auto-grading are thoughtfully leveraged to power these evaluations. Additionally, this role researches and develops auto-grading methodology & infrastructure to benefit ongoing and future Apple Intelligence safety evaluations. Producing safety evaluations that uphold Apple’s Responsible AI values requires thoughtful data sampling, creation, and curation for evaluation datasets; high quality, detailed annotations and careful auto-grading to assess feature performance; and mindful analysis to understand what the evaluation means for the user experience. This role heavily draws on applied data science, scientific investigation and interpretation, cross-functional communication and collaboration, and metrics reporting and presentation.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level