Sr Site Reliability Engineer, Customer Systems

Apple•Austin, TX

19h

About The Position

Imagine what you could do here. Apple is a place where extraordinary people gather to do their best work. Together we craft products and experiences people once couldn’t have imagined — and now can’t imagine living without. If you’re motivated by the idea of making a real impact, and joining a team where we pride ourselves in being one of the most diverse and inclusive companies in the world, we'd love to hear from you! The Customer Systems Team is looking for an experienced Site Reliability Engineer. In this role you will design, build and deliver highly scalable, reliable, secure cloud infrastructure which powers the applications and services used by Apple’s customers every day. You will work closely with cross functional teams, business leaders and other partners across Apple to implement new solutions. If infrastructure as code, automation and intelligent monitoring excites you then this is the job for you.

Requirements

5+ years of experience in designing and building resilient, large-scale, low latency, cloud and on-prem Infrastructure including Compute, Storage, and Network
3+ years of experience with deploying/managing Kubernetes using Helm
Experience with Shell Scripting, Python, or Ansible
Experience in monitoring using Splunk, Grafana, Prometheus, Alertmanager
Deep understanding of networking protocols: DNS, TCP, HTTP/HTTPS
Experience in setting up and managing CI/CD pipelines
Bachelor's or Master's in Computer Science or equivalent experience
Excellent problem solving, critical thinking, and interpersonal skills
Good communication skills to collaborate with distributed teams
Ability to learn new technologies in a short time

Nice To Haves

Experience with Cassandra, MongoDB, Couchbase databases, AWS S3 or similar storage technologies
Experience in deploying, monitoring and supporting java applications
Experience with ArgoCD and GitOps model
Experience in defining, monitoring and achieving key operational metrics like MTTR and SLO
Experience with GenAI tools in workflow automation for infrastructure management

Responsibilities

Innovate, architect, build, and document highly available, scalable, reliable, secure Infrastructure
Troubleshoot application specific, network, system & performance issues
Build and maintain CI/CD infrastructure to enable fast delivery cycles for software engineering teams
Envision and build automation tools to deliver infrastructure services reliably and in a repeatable fashion
Collaborate with other site reliability engineers, software engineers, quality engineers, to gather, define, and analyze non-functional/technical requirements

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume