SRE Manager, ML Operations

AppleNew York, NY

About The Position

We are looking for a senior engineering leader to manage and grow our Site Reliability Engineering team, with a focus on ML Operations. This team owns the reliability, performance, and scalability of the Ad Serving infrastructure that serves as the critical front door of Apple Ads — operating at one of the largest scales in the industry. This is a high-impact leadership role where you will shape the future of how we build, run, and evolve our ML Platforms and Services globally. You will bring deep technical expertise while staying anchored to business and product goals, and you will cultivate a team culture defined by operational excellence, innovation, and continuous improvement.

Requirements

  • 10+ years of experience with large-scale distributed systems
  • 5+ years of experience in an engineering leadership role, ideally managing SRE or Production Engineering teams
  • Proven track record of building and leading high-performing engineering teams
  • Strong grasp of core operating system principles, networking fundamentals, and systems management
  • Deep understanding of SRE principles: monitoring, alerting, error budgets, fault analysis, capacity planning, and incident response
  • Excellent problem-solving, communication, and decision-making skills

Nice To Haves

  • Bachelor's or Master's degree in Computer Science or a related field
  • Experience managing and optimizing GPU-based clusters in production environments
  • Experience building and operating large-scale ML systems or ML infrastructure at scale
  • Hands-on experience managing cloud infrastructure, particularly AWS
  • Familiarity with the digital advertising ecosystem and its technical demands
  • Demonstrated ability to influence and partner across Product, Data Science, and Platform Engineering organizations

Responsibilities

  • Manage and grow our Site Reliability Engineering team, with a focus on ML Operations.
  • Own the reliability, performance, and scalability of the Ad Serving infrastructure.
  • Shape the future of how we build, run, and evolve our ML Platforms and Services globally.
  • Cultivate a team culture defined by operational excellence, innovation, and continuous improvement.
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service