Apple-posted 2 months ago
Senior
Austin, TX
5,001-10,000 employees
Computer and Electronic Product Manufacturing

We are seeking an Engineering Program Manager (EPM) to lead large-scale Site Reliability Engineering (SRE) initiatives that underpin the resilience, scalability, and performance of our cloud-native services. This senior role requires strategic thinking, program leadership, and deep collaboration across engineering, operations, and product to drive reliability outcomes at scale. You will be a key partner to senior engineering leaders, ensuring alignment of priorities, disciplined execution, and operational excellence across the SRE portfolio. Our organization works with many cross functional teams across the company. We're looking for an intellectually curious and creative individual who is comfortable operating in ambiguity, a strategic and operational thinker with strong analytical and creative problem-solving skills. They have a passion for process improvement, operational efficiency, and contributing to delivering on some of Apple's most important product goals through operational execution. You will work directly with our cross-functional team across Global Operations to execute global projects from inception to launch.

  • Lead large-scale Site Reliability Engineering (SRE) initiatives.
  • Drive reliability outcomes at scale through strategic thinking and program leadership.
  • Collaborate across engineering, operations, and product teams.
  • Ensure alignment of priorities and disciplined execution across the SRE portfolio.
  • Execute global projects from inception to launch with cross-functional teams.
  • Manage and influence executive stakeholders without direct authority.
  • Drive root cause analysis and identify corrective actions for incident management.
  • Experience in technical program management, service delivery, or engineering leadership.
  • Proven track record of leading large, multi-team programs in highly available, large-scale distributed systems or cloud environments.
  • Strong understanding of SRE practices, DevOps principles, and modern infrastructure (Kubernetes, containers, cloud platforms like AWS/Azure/GCP).
  • Knowledge of DevOps, continuous delivery, and various AWS services.
  • Deep understanding of incident management processes and best practices.
  • Ability to drive root cause analysis and follow up to closure.
  • Demonstrated success in executive stakeholder management.
  • Excellent communication, negotiation, and presentation skills.
  • Experience with Splunk, Netscalers, OS, servers, storage, databases, backup, load balancers, DMZ, WAF, networking, Citrix, VMWare, Linux.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service