Senior / Staff ML Engineer, Apple Ray, Apple Data Platform

Apple•Cupertino, CA

51d

About The Position

Apple Ray integrates deeply with Apple’s data and ML ecosystem to provide a unified platform for building, orchestrating, and scaling complex ML and data pipelines. As a Software Engineer with ML background, you will design distributed systems that support large-scale model training, tuning, and inference across heterogeneous compute environments—from bare-metal GPU clusters to cloud-native infrastructure. You will build features that enhance developer productivity for ML engineers, improve resource efficiency, and advance the performance and reliability of Apple’s ML workloads. You’ll collaborate closely with ML practitioners to translate model and pipeline needs into robust platform capabilities, while also improving the underlying distributed runtime and control plane. This role requires strong engineering fundamentals, hands-on experience with ML systems, and a passion for building scalable infrastructure.

Requirements

6+ years building distributed systems, high-scale backend services, or compute runtimes.
Solid background in ML workflows, model training, model serving, or data pipeline development.
Proficiency in Python, plus strong experience in a systems-level language (C++, Rust, Go, or Java).
Experience with ML frameworks such as PyTorch or TensorFlow and familiarity with GPU-based training.
Understanding of parallelism strategies, model scaling, or distributed training concepts.
Experience with cluster orchestration (Kubernetes, EKS, GKE) or large-scale compute systems.
Strong debugging skills across distributed and ML-centric runtime environments.
Ability to work cross-functionally with ML engineers, data engineers, and infrastructure teams.
B.S., M.S., or Ph.D. in Computer Science, Machine Learning, or related technical fields — or comparable software engineering experience.

Nice To Haves

Experience with distributed training frameworks (DeepSpeed, Horovod, FSDP, ZeRO).
Background in optimizing GPU workloads or performance benchmarking.
Experience with model orchestration systems or ML platforms.
Contributions to open-source ML or distributed systems projects.
Familiarity with large-scale data systems such as Spark, Flink, or similar.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume