SRE

Apple•Cupertino, CA

12h

About The Position

This is a rare and exciting opportunity to work on some of the world's most impactful internet services including the  AppStore,  Music, Books,  Podcasts, and  Fitness+ within Apple Services Engineering. These are revenue-critical, globally scaled services used by billions of devices worldwide. As an SRE, your mission is to ensure these services are always available, performant, and ready for growth. You will solve complex problems at scale, develop deep troubleshooting expertise, and keep a relentless focus on availability, latency, performance, and capacity. You will drive a culture of operational excellence by replacing toil with automation and by building systems that are resilient by design.

Requirements

Bachelor's degree in Computer Science, Engineering, or equivalent practical experience
3+ years of experience designing, analyzing, and troubleshooting large-scale distributed systems
2+ years of experience leading technical projects and providing engineering leadership
Strong programming or scripting skills (e.g., Python, Go, Java, or shell scripting)

Nice To Haves

Master's degree in Computer Science or Engineering
Experience with observability tooling, SLOs/SLIs, and capacity planning at scale
Familiarity with cloud infrastructure, container orchestration (e.g., Kubernetes), and CI/CD pipelines
Proven track record of driving automation to reduce operational toil in high-traffic production environments

Responsibilities

Own the full service lifecycle rom inception and architecture through deployment, operations, and continuous improvement
Partner with cross-functional teams to prepare services for production through system design consulting, deployment strategy, capacity planning, and production readiness reviews
Monitor, measure, and maintain service health across availability, latency, and performance dimensions
Drive sustainable scalability through automation and tooling, reducing manual operational overhead
Participate in on-call rotations, lead production incident response, and contribute to blameless postmortems
Establish and champion operational excellence practices across the organization
Collaborate closely with Software Engineering, Program Management, Security, and Infrastructure teams to embed reliability throughout the software development lifecycle

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume