About The Position

We are seeking talented engineers to join our team and push the boundaries of evaluations for Siri AI Agents. Evaluation lies at the heart of our model development strategy-it shapes architectural choices, guides launch decisions, and ultimately ensures a world-class user experience. Our team is highly innovative and fast-moving, leveraging auto-evaluators and LLM-based judges to measure, validate, and continuously improve the core Siri AI engine. If you're excited by the challenge of building trusted evaluation systems that directly impact the quality of a groundbreaking AI product used by millions worldwide, this role is for you.

Requirements

  • Experience with large-scale ML model evaluation, testing pipelines, and triage.
  • Knowledge of data generation, training workflows, or context engineering.
  • Familiarity with real-world deployment challenges for AI/ML products.
  • Knowledge of latest methodologies in LLM evaluations.

Responsibilities

  • Design, build, and maintain auto-evaluators that measure the quality of Siri's core AI engine.
  • Identify and triage issues and implement changes to improve auto-evaluator trustworthiness.
  • Work with both simulators and real devices to ensure high-fidelity evaluation and a superior user experience.
  • Collaborate with scientists and engineers across software and ML teams, contributing to products shipped across our portfolio of devices.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Industry

Computer and Electronic Product Manufacturing

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service