Our team, part of Apple Services Engineering, is looking for an ML Research Engineer to lead the design and continuous development of automated safety benchmarking methodologies. In this role, you will investigate how media-related agents behave, develop rigorous evaluation frameworks and techniques, and establish scientific standards for assessing risks they pose and safety performance. This role supports the development of scalable evaluation techniques that ensure our engineers have the right tools to assess candidate models and product features for responsible and safe performance. The capabilities you build will allow for the generation of benchmark datasets and evaluation methodologies for model and application outputs, at scale, to enable engineering teams to translate safety insights into actionable engineering and product improvements. This role blends deep technical expertise with strong analytical judgment to develop tools and capabilities for assessing and improving the behavior of advanced AI/ML models. You will work cross-functionally with Engineering and Project Managers, Product, and Governance teams to develop a suite of technologies to ensure that AI experiences are reliable, safe, and aligned with human expectations. The successful candidate will take a proactive approach to working independently and collaboratively on a wide range of projects. In this role, you will work alongside a small but impactful team, collaborating with ML and data scientists, software developers, project managers, and other teams at Apple to understand requirements and translate them into scalable, reliable, and efficient evaluation frameworks.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees