About The Position

We are looking for talented machine learning engineers who are excited to tackle some of the most meaningful and technically challenging problems in building and deploying foundation model–based products for our customers. As a Machine Learning Engineer focused on foundation model evaluation, you will play a critical role in assessing the capabilities of the models that power Apple Intelligence features. You will work closely with machine learning researchers to translate evaluation insights into actionable improvements that advance future model performance. As a foundation model evaluation Machine Learning Engineer, you will be entrusted with ensuring that foundation model performance can be measured quickly and reliably, in order to support crucial model shipping decisions. You will design, implement, and maintain crucial evaluation infrastructure. You will collaborate extensively with ML researchers on both model hillclimbing and developing novel methodologies for measuring model performance. Your responsibilities will span a number of high-impact parts of the Apple product and foundation model lifecycle.

Requirements

  • 5+ years of hands on ML engineering experiences, with at least 1+ years working directly on large language models or generative AI.
  • Bachelor’s, Master’s, or PhD in Computer Science, Machine Learning, or a related technical field — or equivalent practical experience.
  • Strong software engineering fundamentals: debugging, testing, code reviews, and production reliability / scalability.
  • Hands-on experience with LLM training and / or evaluation workflows, including any of the following: pre-training, post-training, online evaluation, offline evaluation, automated evaluation, human evaluation.

Nice To Haves

  • Hands on experience with evaluating large language models at scale or designing large language model benchmarks.
  • Strong communication skills, able to clearly and concisely convey important information.
  • Self-motivated and curious.
  • Strive to continually learn on the job.
  • High level of creative and critical thinking skills with an innate drive to improve how things work.
  • Have a high tolerance for ambiguity and the ability to identify the most important problems to solve.

Responsibilities

  • Design, implement, and maintain crucial evaluation infrastructure.
  • Collaborate extensively with ML researchers on both model hillclimbing and developing novel methodologies for measuring model performance.
  • Translate evaluation insights into actionable improvements that advance future model performance.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service