About The Position

At Apple, we believe that innovation flourishes in an environment where ideas are challenged, collaboration is encouraged and technology is pushed to its limits. This environment is only possible when diverse minds come together, bringing unique perspectives and experiences. Our people and their ideas inspire innovation in everything we do. Imagine what you could accomplish here! Join Apple and help us make the world a better place. As a principal contributor in our Apple Data Platform SRE organization you will apply SRE principles as you mentor and partner with our engineers and partner teams, ensuring petabyte-scale analytics infrastructure runs reliably and efficiently. This role focuses on managing bare-metal and cloud based infrastructure, levering and extending our infrastructure-as-code based tooling, analyzing and optimizing performance, helping to plan and execute long term fleet management logistics, capacity planning, and ultimately maintaining operational excellence across distributed data platforms that power analytics across Apple. This role includes production on-call responsibilities. DESCRIPTION Apple Service Engineering (ASE) teams build and scale the platforms and infrastructure behind many of Apple's services (such as iCloud, iTunes, Siri, and Maps). We are the foundation on which Apple's software developers build the products that our customers love. We are looking for a passionate and dedicated Senior Site Reliability Engineer to provide technical leadership on our team to help ensure our customers have the highest quality Apple Services experience. The Apple Data Platform (ADP) Compute SRE team is responsible for the core infrastructure, including our legacy bare-metal platforms and modern cloud based infrastructure stack. We partner with both peer SRE teams and several of our world-class software and product engineering teams to support infrastructure reliability, multi-year parallel migrations for Apple properties, as well as the automation, tooling, incident, and process management necessary to ensure smooth 24x7 operations for ADP customers.

Requirements

  • BS/MS in Computer Science or Equivalent 2+ years of experience in Site Reliability Engineering, managing infrastructure and services at scale
  • 5+ years of experience in management or technical leadership roles
  • History of end-to-end project management and delivery
  • Demonstrable programming skills to both develop software/tools and lead code reviews
  • Experience managing Hadoop and Kubernetes infrastructure and related services, or equivalent experience
  • Advanced knowledge of Linux, Networking, and Containers

Nice To Haves

  • 15+ YoE in SRE or related work managing infrastructure at scale
  • Experience with scale testing, disaster recovery, and capacity planning
  • Ability to define the technical roadmap for infrastructure and drive cross-functional alignment on architectural standards and best practices

Responsibilities

  • managing bare-metal and cloud based infrastructure
  • levering and extending our infrastructure-as-code based tooling
  • analyzing and optimizing performance
  • helping to plan and execute long term fleet management logistics
  • capacity planning
  • maintaining operational excellence across distributed data platforms that power analytics across Apple
  • production on-call responsibilities
  • support infrastructure reliability
  • multi-year parallel migrations for Apple properties
  • automation, tooling, incident, and process management necessary to ensure smooth 24x7 operations for ADP customers
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service