Cloud Network Reliability Engineer

AppleSunnyvale, CA
13h

About The Position

As a technical leader within the Cloud Networking organization, you will define and drive the reliability and resiliency architecture for Apple's network platform services. You will be responsible for establishing SRE and SWE best practices, architecting fault-tolerant network control and data planes, and championing data-driven decision-making through observability and automation. You will drive resilient cloud networking solutions that operate reliably across multiple cloud providers and global regions, handling failures gracefully and maintaining service availability. Your technical leadership will ensure Apple's network services meet demanding availability, latency, resilience, and security requirements while continuously improving operational maturity. We are looking for a technical expert who deeply understands cloud networking at scale, is passionate about operating mission-critical, globally distributed infrastructure, preventing outages through proactive engineering, and driving long-term reliability improvements through architectural excellence.

Requirements

  • Extensive experience in software engineering, systems engineering, or infrastructure engineering.
  • Strong background in designing, operating, and supporting highly available, fault-tolerant distributed systems at hyper scale.
  • Strong systems programming skills including multi-threading, concurrency, caching, batching
  • Solid understanding of network infrastructure and software-defined networking (SDN).
  • Ability to lead cross-functional collaboration and influence technical decisions across teams.

Nice To Haves

  • Expert knowledge of API design and interface technologies (JSON, ProtoBuf, REST, RPC, XML, etc)
  • In depth knowledge of K8s, OpenStack, system virtualization, build systems and infrastructure as code
  • Strong knowledge of observability systems (metrics, logging, tracing) and qualification engineering.
  • Broad knowledge of networking solutions across OSI layers 3 through 7.
  • Excellent written and verbal communication skills with the ability to clearly articulate risk, reliability trade-offs, and operational priorities.
  • Proven ability to manage competing priorities, drive initiatives to completion, and deliver results in fast-paced environments.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service