About The Position

Siri is how hundreds of millions of people interact with Apple every day - asking questions, getting things done, and navigating their lives. The quality of that experience matters deeply, and this role sits at the heart of making it better. If you're passionate about AI quality, intelligent systems, and building the kind of rigorous evaluation infrastructure that makes great products possible - this is your opportunity to define what excellence looks like. You'll dig deep into complex failures across Siri's AI pipeline and shape the direction of a product used by people around the world. As an Automation and Triage Engineer for Siri Quality, you will build the tools and systems that hold Siri to the highest standard. You'll design frameworks and automated pipelines that rigorously test how Siri performs across Apple platforms - evaluating whether it truly understands user intent, leverages available on-device context, and delivers an experience that feels effortless and intelligent. Your work transforms subjective quality into measurable signal, giving engineering and ML teams the clarity they need to move fast and ship with confidence.

Requirements

  • Bachelor's Degree in Computer Science or related field.
  • 8+ years of experience in a software development or test engineering role, with demonstrated leadership in quality strategy, and automation.
  • Strong software engineering fundamentals with hands-on experience in Python, Swift, or both, and a track record of building test automation frameworks, CI/CD pipelines, or evaluation infrastructure for complex software systems.
  • Experience with agentic coding systems, using AI-assisted development tools to accelerate implementation, prototype evaluation pipelines, and tackle complex engineering problems with speed and precision.
  • Familiarity with machine learning concepts and LLM-based systems - including evaluation methodologies, prompt design, and model behavior analysis.

Nice To Haves

  • Experience with on-device AI, natural language understanding, or conversational systems is a strong plus, as is familiarity with scenario-based testing, or agent trajectory analysis.
  • Prior work on quality or reliability for consumer-facing AI products is especially valued.

Responsibilities

  • Design and maintain end-to-end test automated test suite for Siri across iOS, macOS, iPadOS, CarPlay, and other Apple platforms.
  • Author and scale evaluation scenarios that reflect real-world user intent and on-device context.
  • Investigate and triage complex failures across Siri's AI stack — planner behavior, tool execution, search, context retrieval, and response generation.
  • Distinguish true product regressions from infrastructure noise, and drive root cause analysis to clear, actionable outcomes.
  • Partner with engineering, ML, and product experience teams to define quality metrics, track regressions, and validate improvements before they ship.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service