About The Position

Amazon Leo is establishing a 24/7 Network Operations Center (NOC) to provide proactive monitoring and rapid incident response for Leo's satellite network service. We are seeking an experienced Operations Manager to lead the U.S.-based NOC team in Redmond, Washington as part of our geographically distributed operations supporting the Leo program. This role will manage a team of approximately 10 Support Engineers and Cloud Support Engineers providing 24/7 coverage, responsible for continuous monitoring of the Leo network service and rapid incident response. You will work closely with the Sr SDM and your counterpart in London to ensure seamless global operations. Export Control Requirement: Due to applicable export control laws and regulations, candidates must be a U.S. citizen or national, U.S. permanent resident (i.e., current Green Card holder), or lawfully admitted into the U.S. as a refugee or granted asylum. The Leo Network Operations Center is a new, strategic function within the Leo organization. As part of the U.S. team, you'll help build the operational foundation for Leo's satellite network service from the ground up. You'll work with observability tools, collaborate with expert engineering teams, and operate at unprecedented scale. This is an opportunity to establish best practices, develop a high-performing team, and play a critical role in delivering low-latency, high-speed broadband connectivity to unserved and underserved communities around the world. The NOC team works closely with Mission Operations and maintains the health and performance of the Leo network through proactive monitoring and rapid incident response.

Requirements

  • Bachelor's degree in computer science, engineering, analytics, mathematics, statistics, IT or equivalent, or experience with networking and troubleshooting (TCP/IP, DNS, routing, switching, firewalls, LAN/WAN, traceroute, iperf, dig, cURL or related) at an advanced level
  • 4+ years of network and operating system support, or 2+ years of relevant technical position experience
  • 3+ years of people management experience leading technical teams

Nice To Haves

  • Experience in large-scale network operations centers or mission-critical environments
  • Experience with observability and monitoring platforms (Grafana, Prometheus, CloudWatch, or similar)
  • Knowledge of ITIL frameworks and incident management best practices
  • Experience with ticketing systems and workflow automation
  • Background in telecommunications, ISP operations, or satellite communications
  • Understanding of SLA management and operational metrics

Responsibilities

  • Lead and develop a team of 10 Support Engineers and Cloud Support Engineers in Redmond, Washington
  • Manage 24/7 shift operations to provide continuous coverage
  • Oversee continuous monitoring of Leo network health at spot level (groups of customer terminals) and regional aggregations
  • Ensure the team performs initial triage, documents incidents, and manage incident response workflows through resolution
  • Coordinate with subject matter expert for complex issues requiring specialized technical expertise
  • Maintain communication with stakeholders during active incidents and provide status updates
  • Implement and refine Standard Operating Procedures (SOPs) for incident response, escalation, monitoring, and shift operations
  • Drive adherence to established runbooks and troubleshooting guides
  • Ensure proper ticket lifecycle management and documentation standards
  • Conduct shift handoff procedures and knowledge transfer protocols
  • Lead post-incident reviews to capture lessons learned and identify improvement opportunities
  • Oversee team use of observability tools including Grafana dashboards
  • Monitor alarm systems for spot-level outages and ensure timely response
  • Review dashboards for anomalies, trends, and performance degradation
  • Partner with London Operations Manager to ensure seamless 24/7 global coverage
  • Collaborate with Mission Operations, Customer Service Agents (CSAs), and Business Customer Experience (BCX) teams
  • Work with engineering teams to identify automation opportunities and improve observability
  • Track and report on key performance indicators including time-to-detection and time-to-resolution
  • Identify trends in incident types and work with engineering to prevent recurrence
  • This role requires flexibility to support 24/7 operations, including occasional off-hours support during major incidents
  • Occasional travel to London and other operational sites (estimated 10-15%)
  • May require participation in on-call rotation for management escalations

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service