About The Position

The ADP ML Data Platform team enables future Apple intelligent products by providing Apple engineers with cutting edge ML technologies, large scale compute and data systems specifically designed for machine learning. You will build the data foundation that powers ML training across Apple. Our team enables governed, scalable sharing of text and multimodal datasets, ensuring teams can safely discover, access, and use high-quality data for training. We focus on turning raw data into usable training assets with streamlining data preparation, enabling rapid iteration, and supporting advanced techniques such as synthetic data workflows. Our goal is to remove friction between data creation and model experimentation so teams can move from idea to training quickly and confidently. Most critically, we optimize how data is consumed during training. We work on improving GPU utilization and reducing training bottlenecks through deep benchmarking, profiling, and system-level optimization of data pipelines. This includes designing high-performance data access patterns for large-scale distributed workloads and ensuring reliability and efficiency at scale. You will operate at the intersection of ML systems and infrastructure, partnering with model teams to improve end-to-end training performance, eliminate inefficiencies, and raise the bar on reproducibility and governance. We are looking for engineers with strong experience in large-scale training systems, performance optimization, and data-intensive ML workloads. If you care about maximizing efficiency, designing scalable data architectures, and enabling the next generation of generative AI models, this role offers the scope and impact to do exactly that.

Requirements

  • Strong foundation in machine learning systems, with hands-on experience in large-scale training workflows and data-intensive ML pipelines
  • Deep understanding of training performance optimization, including profiling, benchmarking, and eliminating data bottlenecks in distributed environments
  • Experience building production-grade ML data or training infrastructure with strong guarantees around reproducibility, versioning, and governance
  • Proven ability to design high-throughput, low-latency data pipelines for large-scale GPU workloads
  • Familiarity with modern foundation models and multimodal training workloads
  • Experience operating and debugging distributed systems in large-scale production environments
  • Strong systems programming skills in Python and at least one of Java or Go
  • Ability to work cross-functionally with research, infrastructure, and product teams to improve end-to-end ML performance
  • Comfortable operating in fast-moving, ambiguous problem spaces with evolving technical requirements
  • B.S., M.S., or Ph.D. in Computer Science, Computer Engineering, or equivalent practical experience

Nice To Haves

  • Drive platform-wide improvements in data efficiency, resilience, and observability across distributed environments
  • Diagnose and resolve complex cross-stack performance issues, from data ingestion through training execution, ensuring reliability at scale
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service