Engineering Manager, Cloud Network Reliability

Apple•Sunnyvale, CA

18d

About The Position

Apple Cloud Networking team builds and operates large-scale, software-defined networking platforms that enable secure, resilient, and highly available multi-cloud connectivity with a global footprint. Our infrastructure powers critical Apple services, including iCloud, iTunes, Siri, and Maps. We are seeking an experienced and visionary Reliability Engineering Manager to lead and grow a team of engineers focused on ensuring the availability, performance, scalability, and resiliency of Apple’s global network services. In this role, you will work closely with software engineering, infrastructure, and operations teams across Apple to deliver reliable, fault-tolerant systems that operate at massive scale. As a key leader within the Cloud Networking organization, you will define and drive the reliability and resiliency strategy for Apple’s network platform services. You will be responsible for building, scaling, and mentoring a high-performing Production Engineering team that champions SRE and SWE best practices, release engineering, and data-driven decision-making. You will establish strong cross-functional partnerships to ensure reliability and resiliency are embedded throughout the system lifecycle—from design and development to deployment and operations. Your leadership will help ensure Apple’s network services meet demanding availability, latency, resilience, and security requirements while continuously improving operational maturity. We are looking for a leader who is deeply passionate about operating mission-critical, globally distributed systems, preventing outages, learning from failures, and driving long-term reliability improvements.

Requirements

10+ years of experience in software engineering, systems engineering, or infrastructure engineering.
6+ years of experience in a technical leadership role with people management responsibilities.
Strong background in designing, operating, and supporting highly available, fault-tolerant distributed systems at scale.
Hands-on experience with reliability engineering, SRE, or large-scale production operations.
Solid understanding of network infrastructure and software-defined networking (SDN).
Ability to lead cross-functional collaboration and influence technical decisions across teams.

Nice To Haves

Experience in defining and operating SLO-based reliability and resiliency programs.
Strong knowledge of observability systems (metrics, logging, tracing) and qualification engineering.
Experience with microservices architectures, RESTful APIs, and cloud-native platforms.
In-depth understanding of networking protocols, routing mechanisms, and traffic management.
Broad knowledge of networking solutions across the OSI layers 3 through 7.
Excellent written and verbal communication skills with the ability to clearly articulate risk, reliability trade-offs, and operational priorities.
Proven ability to manage competing priorities, drive initiatives to completion, and deliver results in fast-paced environments.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume