The Productivity and Machine Learning Evaluation team ensures the quality of AI-powered features across a suite of productivity and creative applications, including Creator Studio. This team serves as the primary evaluation function, providing critical quality signals that directly influence model development decisions and product launches. This role focuses on building and scaling automated evaluation systems and designing adversarial and stress-testing methodologies across multiple AI features. The work requires a deep understanding of how AI systems fail and how to measure quality rigorously. As features evolve from single-turn interactions into multi-turn, agentic experiences, the evaluation challenge shifts from assessing individual outputs to stress-testing entire conversation flows and agent decision chains. This is an opportunity to shape the evaluation infrastructure that determines whether AI features meet the bar for hundreds of millions of users.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Number of Employees
5,001-10,000 employees