About The Position

AWS Infrastructure Services owns the design, planning, delivery, and operation of all AWS global infrastructure. In other words, we’re the people who keep the cloud running. We support all AWS data centers and all of the servers, storage, networking, power, and cooling equipment that ensure our customers have continual access to the innovation they rely on. We work on the most challenging problems, with thousands of variables impacting the supply chain — and we’re looking for talented people who want to help. You’ll join a diverse team of software, hardware, and network engineers, supply chain specialists, security experts, operations managers, and other vital roles. You’ll collaborate with people across AWS to help us deliver the highest standards for safety and security while providing seemingly infinite capacity at the lowest possible cost for our customers. And you’ll experience an inclusive culture that welcomes bold ideas and empowers you to own them to completion. The AWS Global Operations Support Engineering (GOSE) team is seeking a Global Operational Engineer (GOE) to serve as a technical resource and leader to drive the team’s long-term vision within the Data Center Community (DCC). The GOE will need a deep technical understanding of data center infrastructure, engineering and operations. They act as a subject matter expert and are responsible for diving deep into global data center telemetry, alarm, and incident data. They rely on their skills of statistical data analysis, business intelligence engineering, and project management to identify business insights, produce visualizations and reports, and drive actions through scalable mechanisms. The GOE owns reporting on infrastructure availability, customer impacting and system impairment events, and infrastructure alarms with visibility up to vice presidents. They will ensure the event record data base is kept up to date and accurate. The position will help ensure infrastructure availability and reliability performance metrics meet or exceed defined service levels, and that we achieve global resolution of any unplanned event risk through partnership and influence of stakeholders, such as Field Engineering, Operations, Controls, and Reliability/Quality teams. If you are passionate about the Customer Experience; you think and act globally; you run towards problems; you boldly challenge others; and you want to contribute to the operational excellence of Amazon Data Centers, then this may be the challenge you are looking for!

Requirements

  • 3+ years of technical product or program management experience
  • Experience managing programs across cross functional teams, building processes and coordinating release schedules

Nice To Haves

  • 3+ years of working directly with engineering teams experience

Responsibilities

  • Deep dive customer-impacting and non-customer-impacting event data and partner with stakeholder teams to drive global operational and engineering action items.
  • Investigate data center infrastructure alarms and telemetry to identify equipment failure trends and partner with stakeholder teams to drive solutions.
  • Review after action reports (AAR) to ensure standards are met.
  • Create and continuously improve operational and infrastructure reports based on feedback from customers and management team.
  • Ensure the accuracy of availability data sets and commit to the development of automation and process standards to continuously improve.
  • Aggregate and analyze data from multiple sources and compiling it into a digestible and actionable format, mostly via QuickSight dashboards.
  • Contribute results and content into senior leadership availability documents and goals.
  • Engage with partner teams to review data analysis insights and drive corrective actions by utilizing processes, such as Correction of Errors (COE), AWS Customer Root Cause (RCA), Lessons Learned, and Global Action Items.
  • Communicate actions/recommendations to stakeholders in both written and verbal formats and ensure they are entered into a mechanism with an owner and due date.
  • Serve as liaison between the business and technical teams to achieve the goal of providing actionable insights into business performance and programs. This will require data gathering and manipulation, problem solving, and communication of insights and recommendations.
  • Develop and mentor other engineers involved in data collection, analysis and reporting.
  • Identify opportunities and initiate new operational availability programs and reports.

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service