Operations Manager, Leo Network Operations Center

Amazon•Redmond, WA

1d•Onsite

About The Position

Amazon Leo is establishing a 24/7 Network Operations Center (NOC) to provide proactive monitoring and rapid incident response for Leo's satellite network service. We are seeking an experienced Operations Manager to lead the U.S.-based NOC team in Redmond, Washington as part of our geographically distributed operations supporting the Leo program. This role will manage a team of approximately 10 Support Engineers and Cloud Support Engineers providing 24/7 coverage, responsible for continuous monitoring of the Leo network service and rapid incident response. You will work closely with the Sr SDM and your counterpart in London to ensure seamless global operations. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum. The Leo Network Operations Center is a new, strategic function within the Leo organization. As part of the U.S. team, you'll help build the operational foundation for Leo's satellite network service from the ground up. You'll work with observability tools, collaborate with expert engineering teams, and operate at unprecedented scale. This is an opportunity to establish best practices, develop a high-performing team, and play a critical role in delivering low-latency, high-speed broadband connectivity to unserved and underserved communities around the world. The NOC team works closely with Mission Operations and maintains the health and performance of the Leo network through proactive monitoring and rapid incident response.

Requirements

Bachelor's degree in computer science, engineering, analytics, mathematics, statistics, IT or equivalent, or experience with networking and troubleshooting (TCP/IP, DNS, routing, switching, firewalls, LAN/WAN, traceroute, iperf, dig, cURL or related) at an advanced level
4+ years of network and operating system support, or 2+ years of relevant technical position experience
3+ years of people management experience leading technical teams

Nice To Haves

Experience in large-scale network operations centers or mission-critical environments
Experience with observability and monitoring platforms (Grafana, Prometheus, CloudWatch, or similar)
Knowledge of ITIL frameworks and incident management best practices
Experience with ticketing systems and workflow automation
Background in telecommunications, ISP operations, or satellite communications
Understanding of SLA management and operational metrics

Responsibilities

Lead and develop a team of 10 Support Engineers and Cloud Support Engineers in Redmond, Washington
Manage 24/7 shift operations to provide continuous coverage
Oversee continuous monitoring of Leo network health at spot level (groups of customer terminals) and regional aggregations
Ensure the team performs initial triage, documents incidents, and manage incident response workflows through resolution
Coordinate with subject matter expert for complex issues requiring specialized technical expertise
Maintain communication with stakeholders during active incidents and provide status updates
Implement and refine Standard Operating Procedures (SOPs) for incident response, escalation, monitoring, and shift operations
Drive adherence to established runbooks and troubleshooting guides
Ensure proper ticket lifecycle management and documentation standards
Conduct shift handoff procedures and knowledge transfer protocols
Lead post-incident reviews to capture lessons learned and identify improvement opportunities
Oversee team use of observability tools including Grafana dashboards
Monitor alarm systems for spot-level outages and ensure timely response
Review dashboards for anomalies, trends, and performance degradation
Partner with London Operations Manager to ensure seamless 24/7 global coverage
Collaborate with Mission Operations, Customer Service Agents (CSAs), and Business Customer Experience (BCX) teams
Work with engineering teams to identify automation opportunities and improve observability
Track and report on key performance indicators including time-to-detection and time-to-resolution
Identify trends in incident types and work with engineering to prevent recurrence
This role requires flexibility to support 24/7 operations, including occasional off-hours support during major incidents
Occasional travel to London and other operational sites (estimated 10-15%)
May require participation in on-call rotation for management escalations

Benefits

health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
401(k) matching
paid time off
parental leave

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume