About The Position

The Cloud Infrastructure Business Operations (CIBO) team is a centralized organization responsible for demand planning of infrastructure resources across Apple. We are seeking a Manager of ML Compute Capacity Planning to lead capacity planning efforts for Apple's ML Training and Gen AI platforms. These platforms provide services to all internal Apple developers, delivering efficient and scalable compute and processing for the machine learning lifecycle, from model experimentation to deployment, across the entire Apple consumer ecosystem. We're looking for a strategic leader with deep expertise in capacity planning, demand forecasting, and infrastructure optimization for large-scale ML compute environments. In this role, you will build and lead a team of capacity planners responsible for ensuring Apple's ML and Gen AI infrastructure meets current and future demand. This includes developing long-range capacity models, driving accelerator hardware strategy, managing supply/demand balance, and partnering with finance on investment planning. You will serve as the central capacity planning voice for interactions with public cloud providers and internal Apple Cloud team’s. Ensuring Apple has the right compute resources, in the right place, at the right time, and at the right cost.

Requirements

  • 8+ years of experience in capacity planning, infrastructure strategy, or technical operations, with at least 3+ years in a people management role
  • Deep expertise in capacity planning methodologies, demand forecasting, and resource optimization for large-scale compute environments
  • Experience with ML and Gen AI compute infrastructure including GPU/TPU accelerators, data center operations, and cloud platforms
  • Proven track record in supply/demand management and financial operations related to infrastructure investments
  • Strong analytical skills with proficiency in capacity modeling, cost-performance trade-off analysis, and scenario planning
  • Experience evaluating and onboarding new accelerator technologies into production environments
  • Demonstrated ability to build, develop, and inspire high-performing teams
  • Outstanding interpersonal and communication skills, with the ability to influence hardware vendors, cloud providers, finance partners, and senior engineering leadership
  • Strategic mindset with the ability to balance long-term planning with near-term execution
  • BS in Engineering, CS, Systems Engineering, Supply Chain or equivalent work experience

Nice To Haves

  • MBA or MS in Engineering, CS, Systems Engineering, Supply Chain, or related field
  • 10 or more years of experience with 5 or more years in management roles
  • Experience leading capacity planning for ML/AI and Gen AI infrastructure at hyperscale

Responsibilities

  • Developing long-range capacity models
  • Driving accelerator hardware strategy
  • Managing supply/demand balance
  • Partnering with finance on investment planning
  • Serving as the central capacity planning voice for interactions with public cloud providers and internal Apple Cloud team’s
  • Ensuring Apple has the right compute resources, in the right place, at the right time, and at the right cost.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service