Senior Site Reliability Engineer, Apple Data Platform Infra SRE

Apple•Cupertino, CA

55d

About The Position

At Apple, we believe that innovation flourishes in an environment where ideas are challenged, collaboration is encouraged and technology is pushed to its limits. This environment is only possible when diverse minds come together, bringing unique perspectives and experiences. Our people and their ideas inspire innovation in everything we do. Imagine what you could accomplish here! Join Apple and help us make the world a better place. As a principal contributor in our Apple Data Platform SRE organization you will apply SRE principles as you mentor and partner with our engineers and partner teams, ensuring petabyte-scale analytics infrastructure runs reliably and efficiently. This role focuses on managing bare-metal and cloud based infrastructure, levering and extending our infrastructure-as-code based tooling, analyzing and optimizing performance, helping to plan and execute long term fleet management logistics, capacity planning, and ultimately maintaining operational excellence across distributed data platforms that power analytics across Apple. This role includes production on-call responsibilities. DESCRIPTION Apple Service Engineering (ASE) teams build and scale the platforms and infrastructure behind many of Apple's services (such as iCloud, iTunes, Siri, and Maps). We are the foundation on which Apple's software developers build the products that our customers love. We are looking for a passionate and dedicated Senior Site Reliability Engineer to provide technical leadership on our team to help ensure our customers have the highest quality Apple Services experience. The Apple Data Platform (ADP) Compute SRE team is responsible for the core infrastructure, including our legacy bare-metal platforms and modern cloud based infrastructure stack. We partner with both peer SRE teams and several of our world-class software and product engineering teams to support infrastructure reliability, multi-year parallel migrations for Apple properties, as well as the automation, tooling, incident, and process management necessary to ensure smooth 24x7 operations for ADP customers.

Requirements

BS/MS in Computer Science or Equivalent 2+ years of experience in Site Reliability Engineering, managing infrastructure and services at scale
5+ years of experience in management or technical leadership roles
History of end-to-end project management and delivery
Demonstrable programming skills to both develop software/tools and lead code reviews
Experience managing Hadoop and Kubernetes infrastructure and related services, or equivalent experience
Advanced knowledge of Linux, Networking, and Containers

Nice To Haves

15+ YoE in SRE or related work managing infrastructure at scale
Experience with scale testing, disaster recovery, and capacity planning
Ability to define the technical roadmap for infrastructure and drive cross-functional alignment on architectural standards and best practices

Responsibilities

managing bare-metal and cloud based infrastructure
levering and extending our infrastructure-as-code based tooling
analyzing and optimizing performance
helping to plan and execute long term fleet management logistics
capacity planning
maintaining operational excellence across distributed data platforms that power analytics across Apple
production on-call responsibilities
support infrastructure reliability
multi-year parallel migrations for Apple properties
automation, tooling, incident, and process management necessary to ensure smooth 24x7 operations for ADP customers