About The Position

Scaling machine learning workloads across thousands of accelerators creates challenges that few engineers ever encounter. In Apple’s Machine Learning Platform Technologies organization, we build the infrastructure that powers large-scale ML training and inference workloads, bringing together expertise in distributed systems, machine learning infrastructure, and high-performance computing. As a capacity engineer in ML Compute Capacity team, you’ll build tools, conduct analysis and implement processes to ensure optimal operation and growth of our infrastructure. This includes working with infrastructure engineers and ML teams to ensure compute resources are used effectively, and shaping strategic procurement decisions.

Requirements

  • Ability to translate complex data into easy-to-understand actionable insights and recommendations
  • Experience crafting robust queries over large-scale, multi-source data
  • Proficiency in scripting, automation or modeling tools
  • Experience on cross-functional projects spanning ML research, infrastructure and finance
  • Bachelor’s degree or higher in Engineering, Data Science, Economics or a related quantitative field

Nice To Haves

  • Experience developing ML models to surface insights and drive solutions
  • Familiarity with accelerator utilization patterns across ML training and inference
  • Familiarity with cloud compute, storage, network and services
  • Comfortable developing with modern web frameworks and RESTful APIs

Responsibilities

  • build tools
  • conduct analysis
  • implement processes to ensure optimal operation and growth of our infrastructure
  • working with infrastructure engineers and ML teams to ensure compute resources are used effectively
  • shaping strategic procurement decisions
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service