About The Position

Siri is how hundreds of millions of people interact with Apple every day - asking questions, getting things done, and navigating their lives. The quality of that experience matters deeply, and this role sits at the heart of making it better. If you're passionate about AI quality, intelligent systems, and building the kind of rigorous evaluation infrastructure that makes great products possible - this is your opportunity to define what excellence looks like. You'll dig deep into complex failures across Siri's AI pipeline and shape the direction of a product used by people around the world. As an Automation and Triage Engineer for Siri Quality, you will build the tools and systems that hold Siri to the highest standard. You'll design frameworks and automated pipelines that rigorously test how Siri performs across Apple platforms - evaluating whether it truly understands user intent, leverages available on-device context, and delivers an experience that feels effortless and intelligent. Your work transforms subjective quality into measurable signal, giving engineering and ML teams the clarity they need to move fast and ship with confidence.

Requirements

  • Bachelor's Degree in Computer Science or related field.
  • 8+ years of experience in a software development or test engineering role.
  • Demonstrated leadership in quality strategy, and automation.
  • Strong software engineering fundamentals with hands-on experience in Python, Swift, or both.
  • Track record of building test automation frameworks, CI/CD pipelines, or evaluation infrastructure for complex software systems.
  • Experience with agentic coding systems, using AI-assisted development tools to accelerate implementation, prototype evaluation pipelines, and tackle complex engineering problems with speed and precision.
  • Familiarity with machine learning concepts and LLM-based systems - including evaluation methodologies, prompt design, and model behavior analysis.

Nice To Haves

  • Experience with on-device AI, natural language understanding, or conversational systems is a strong plus.
  • Familiarity with scenario-based testing, or agent trajectory analysis.
  • Prior work on quality or reliability for consumer-facing AI products is especially valued.

Responsibilities

  • Build the tools and systems that hold Siri to the highest standard.
  • Design frameworks and automated pipelines that rigorously test how Siri performs across Apple platforms.
  • Evaluate whether Siri truly understands user intent, leverages available on-device context, and delivers an experience that feels effortless and intelligent.
  • Transform subjective quality into measurable signal, giving engineering and ML teams the clarity they need to move fast and ship with confidence.
  • Dig deep into complex failures across Siri's AI pipeline.
  • Shape the direction of a product used by people around the world.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service