About The Position

We live in a mobile and device-driven world where Deep Learning technology enables a new class of applications. We are looking for software development engineer to design and build agentic systems for Large Language Model (LLM) evaluation and synthetic data generation. Imagine the countless possibilities powered by Artificial Intelligence! Are you passionate about enabling unique user experiences on Apple products; such as Apple Vision Pro, iPhone, iPad, Apple Watch and the Mac? In the Video Engineering team, we are dedicated to providing hardware software solutions and execution of Deep Learning workloads. Our success is the result of very dynamic people working in an environment which cultivates creativity, partnership and cross-functional collaboration. These elements come together to make Apple an amazing environment for motivated people to do the greatest work of their lives! DESCRIPTION As a Software Engineer in the test role, you will collaborate with world-class machine learning engineers and data scientists to understand the features you will support. In this role, you will create end-to-end automated evaluation pipelines that orchestrate multiple LLMs to generate test data, stress models, identify failure modes, and enable safe, scalable model deployment. This is a highly technical, hands-on role at the intersection of AI systems engineering, evaluation science, and automation.

Requirements

  • BS and a minimum of 3 years relevant industry experience
  • Strong Python skills with experience building production-grade automation
  • Strong knowledge of software development lifecycle, testing methodologies, QA terminology and processes
  • Experience designing or implementing agentic or multi-step LLM workflow
  • Experience generating and validating synthetic data

Nice To Haves

  • 2+ years experience in test automation or related areas, background in QA, test engineering
  • Experience with agent frameworks such as LangGraph, AutoGen, CrewAI or similar
  • Experience building human-in-the-loop evaluation system
  • Knowledge of CI/CD pipelines for ML or evaluation workflows
  • Ability to multi-task and lead tasks with varying priorities
  • Experience in popular Database management software, e.g. SQL
  • Excellent written and verbal interpersonal skills, be able to describe and document clearly
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service