Machine Learning Efficiency Engineer - SIML, ISE

Apple Inc.•Seattle, WA

77d

About The Position

Are you passionate about Generative AI and excited to work on groundbreaking modeling technologies that will enrich the lives of billions? The Intelligence System Experience (ISE) team within Apple's software organization is a multidisciplinary group operating at the intersection of Multimodal Foundation Models, Efficient and Scalable ML Infrastructure, and Personalized Intelligent Experiences. As a machine learning engineer on our team, you will design software systems and algorithms that enable performant, scalable training and inference for Apple's AI-driven experiences across both on-device and server environments. This role also includes opportunities to open source your work. Join our team of highly skilled, impact-focused engineers!We're seeking strong machine learning engineers to help build next-generation tools for large-scale deep learning. You'll join a team focused on accelerating training and inference speed, improving scalability, and advancing Apple's centralized ML platform. Candidates should bring polished coding skills and a passion for machine learning and computational science. We offer a respectful work environment, flexible responsibilities, and access to world-class experts and growth opportunities. In this role, you will develop core components for our scalable ML platform, push the limits of existing training technologies, and create new techniques to overcome system constraints. Your work will be deployed on high-impact tasks across teams building Apple Intelligence products, with opportunities to open-source your contributions. We are especially looking for a PyTorch-focused ML efficiency expert to optimize training and inference performance, improve distributed training throughput, and drive system-level efficiency for large-scale models. If you have deep experience with PyTorch internals and high-performance ML infrastructure, we'd love to hear from you. We encourage releasing contributions as open source.

Requirements

Experience developing model parallel and data parallel training solutions and other training optimizations.
Experience with parallel training libraries such as torch.distributed, DeepSpeed, or FairScale.
Experience with CUDA-level optimization.
Experience building ML models targeting Apple Silicon.
Experience building large-scale deep learning infrastructure or platforms for distributed model training.
Publication record at ML conferences such as MLSys, NeurIPS, etc.