About The Position

This role focuses on developing, carrying-out, interpreting, and communicating pre- and post-ship evaluations of the safety of Apple Intelligence features. Both human grading and model-based auto-grading are thoughtfully leveraged to power these evaluations. Additionally, this role researches and develops auto-grading methodology & infrastructure to benefit ongoing and future Apple Intelligence safety evaluations. Producing safety evaluations that uphold Apple’s Responsible AI values requires thoughtful data sampling, creation, and curation for evaluation datasets; high quality, detailed annotations and careful auto-grading to assess feature performance; and mindful analysis to understand what the evaluation means for the user experience. This role heavily draws on applied data science, scientific investigation and interpretation, cross-functional communication and collaboration, and metrics reporting and presentation.

Requirements

  • MS, or PhD in Computer Science, Machine Learning, Statistics, or related fields; or an equivalent qualification acquired through other avenues.
  • Experience working with generative models for evaluation and/or product development, and up-to-date knowledge of common challenges and failures.
  • Strong engineering skills and experience in writing production-quality code in Python.
  • Deep experience in foundation model-based AI programming (i.e.: using DSPy for optimizing foundation model prompts, for example) and a drive to innovate in this space.
  • Experience working with noisy, crowd-based data labels and human evaluations.

Nice To Haves

  • Experience working in the Responsible AI space.
  • Prior scientific research and publication experience.
  • Strong organizational and operational skills working with large, multi-functional, and diverse teams.
  • Curiosity about fairness and bias in generative AI systems, and a strong desire to help make the technology more equitable.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service