About The Position

Apple is where individual imaginations gather together, committing to the values that lead to great work. Every new product we build, service we create, or Apple Store experience we deliver is the result of us making each other’s ideas stronger. That happens because every one of us shares a belief that we can make something wonderful and share it with the world, changing lives for the better. It’s the diversity of our people and their thinking that inspires the innovation that runs through everything we do. When we bring everybody in, we can do the best work of our lives. Here, you’ll do more than join something — you’ll add something. At Apple, we create products and services that have changed entire industries. Our diverse team of people and their ideas inspire innovation in everything we do. Imagine what you could do here! Join Apple and help us make the world a better place. Edge Services is responsible for the foundational services that every Apple team and billions of customer devices rely on. Our services need to be highly available, scale for global reach, and just work. If you love designing, engineering, and running systems that will help our customers, then this is the perfect place for you! The Edge Services team is on the hunt for a software engineer focused to champion the evolution of our production ecosystems. In this role, you will help drive the vision for our visibility, moving beyond simple uptime metrics to build a sophisticated, data-driven reliability framework. You will play a pivotal role in ensuring our services are resilient, scalable, and observable, bridging the gap between complex distributed systems and seamless user experiences. We’re seeking an engineer who is passionate about building system software, solving seemingly insurmountable problems, and deeply committed to delivering an outstanding customer experience. You'll go beyond the industry standard, demonstrating creativity in problem-solving, the ability to think dynamically, and the agility to adapt quickly to new technical areas.

Requirements

  • Systems Expertise: Strong understanding of Linux internals and deep networking expertise, including HTTP/2, HTTP/3 (QUIC), and HTTPS/TLS. You should be comfortable debugging protocol-level issues and optimizing traffic flow.
  • Automation Mindset: Proven ability to automate repetitive tasks and complex workflows using Python or Go
  • Observability Logic: Experience configuring and managing modern monitoring suites (e.g., Prometheus, Grafana, ClickHouse) with a focus on creating actionable, high-signal quality alerting.
  • CS Fundamentals: Solid grasp of Data Structures and Algorithms (DSA) to write efficient, performant code and troubleshoot complex system bottlenecks.
  • SRE Principles: Practical knowledge of SLIs, SLOs, Error Budgets, Release Management and Incident Management to drive engineering priorities.

Nice To Haves

  • Infrastructure as Code: Experience managing cloud environments (AWS, GCP, or Azure) using Terraform, Ansible, or Pulumi.
  • Orchestration: Hands-on experience scaling and securing containerized workloads via Kubernetes.
  • Incident Response: A track record of leading "blameless post-mortems" and using those insights to harden the system against future failures.
  • Architectural Influence: Ability to consult with product teams on service design to improve long-term maintainability.
  • Reliability Engineering: A proactive engineering mindset focused on shifting from "fixing things when they break" to "designing things so they don't break" (or so they fail gracefully).

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Mid Level

Education Level

No Education Listed

Number of Employees

5,001-10,000 employees

© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service