Cloud Monitoring SRE Manager

Apple-posted 4 months ago

Full-time • Manager

Seattle, WA

5,001-10,000 employees

Resume

Match Score

Upload and Match ResumeTrack Jobs with Teal

People at Apple don't just build products — they craft the kind of experience that have revolutionized entire industries. The diverse collection of our people and their ideas inspire innovation in everything we do. Imagine what you could do here! Join Apple, and help us leave the world better than we found it. The Apple Service Engineering(ASE) team builds and provides systems and infrastructure that fuel Apple's services (such as iCloud, iTunes, Siri, and Maps). We are the foundation on which Apple's software developers build the products that our customers love. We are looking for a passionate and dedicated Site Reliability Engineering Manager, to lead a team which focuses on providing our customers the highest quality Apple Services experience. Our services have to scale globally, stay highly available, and 'just work.' If you love designing, engineering and running systems and infrastructure that will help millions of customers, then this is the place for you! The Cloud Monitoring SRE organization is specifically tasked with enabling other teams to better understand their infrastructure and services, providing extraordinary observability capabilities. Keeping Apple services up and running 100% of the time is a critical job. Accurately monitoring the health of every application and infrastructure that comprises the Apple ecosystem 100% of the time is an order of magnitude more challenging. As a Site Reliability Engineering Manager for the Cloud Monitoring Team at Apple you will be working to build and mentor team to improve the reliability and performance of the software systems that provide access to the services & infrastructure that runs Apple. Our monitoring, alerting, and visualization platform analyzes billions of metrics per minute and comprises the central nervous system of Apple's architecture.

Lead a team responsible for providing the platform for mission-critical observability services.
Maintain constant uptime and scale seamlessly for new applications and services.
Collaborate with developers and architects to aid in design and implementation.
Improve stability, security, and scalability of software systems.
Mentor team members to meet their career goals and the organization's goals.

Minimum 5+ years of handling services in a large scale environment.
Experience with hiring and leading engineers.
Experience with Cloud Computing technologies (particularly Kubernetes).
Experience and confidence around incident response and incident management.
Experience with the Prometheus ecosystem.
Practical experience in Python and bash scripting.
Theoretical knowledge of Go, Java, and/or Scala.
Strong sense of ownership and integrity demonstrated through clear communication and collaboration.

2+ years professional experience in an engineering leadership position.
Comfortable with Open Source configuration management and orchestration tools (such as Helm, Puppet, and Spinnaker).
Experience in running and scaling distributed systems in a public, private, or hybrid cloud environment.
Bachelors or Master's degree in computer science or similar field or equivalent experience.

Track Jobs with Teal

Job Search Resources

•

Resume Builder

•

Resume Examples

•

Cover Letter Examples

Cloud Monitoring SRE Manager

Job Search Resources

Tools

Career Hubs

Guides

Company