Engineering Program Manager, Siri and Apple Intelligence Evaluation

Apple•Cupertino, CA

About The Position

You will work with top tier data scientists, engineers, research teams, and product teams across Apple to help ensure we deliver high-quality, safe, and beneficial AI-powered experiences that over 1 billion customers expect and love. This role requires technical depth in evaluation methodologies combined with strong program management expertise to drive comprehensive assessment of model capabilities, safety, helpfulness, and user experience quality.

Requirements

Bachelor's degree in Statistics, Business Intelligence, Computer Science, other Quantitative Sciences, or related field and equivalent experience
8+ years of experience in driving large scale program building machine learning powered products or analytics to support product development
5+ years of experience managing programs in AI powered product space, preferably experience in evaluation of ML/AI products
Ability to deal with ambiguities, drive disambiguation and clarities around evaluation methodologies, shepherd multiple teams to converge on rigorous measurement frameworks
Experience designing and implementing evaluation systems for machine learning models, particularly large language models or conversational AI systems
Program management skills including program structuring and managing multiple work streams interdependently across research, engineering, and product teams
Problem-solving skills with attention to details in identifying edge cases, failure modes, and capability gaps
Ability to communicate abstract ideas clearly, manage comprehensive yet succinct program status updates to all levels of audience, both verbally and in written forms
Proven adaptability and agility in making adjustments to program strategy and plan with evolving model capabilities and product decisions

Nice To Haves

Master's or PhD degree in Statistics, Machine Learning, Computer Science, other Quantitative Sciences, or related field and equivalent experience
Experience with statistical analysis and drawing meaningful conclusions from large-scale evaluation datasets
Deep understanding of LLM capabilities, limitations, and safety considerations
Self-sufficient in analyzing and drawing conclusions about model quality, user experience, and product opportunity from raw and refined evaluation data
Player-coach capable of personally leading large evaluation initiatives while coaching team members along the way and mentoring team members to grow evaluation expertise

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume