About The Position

The Cluster Operations Leader manages Amazon Data Center Clusters and Colocation Operations within their assigned region. As the senior Infrastructure Operations leader, they oversee safety, security, availability, scaling, efficiency, and cost management. Infrastructure Operations consists of two core functions: Data Center Operations (DCO) and Data Center Engineering Operations (DCEO). DCO manages server-level platforms supporting Amazon Retail and Amazon Web Services, while DCEO handles mechanical, electrical, and controls systems for critical environments. We seek a leader with experience on both DCO and DCEO systems. Though operating independently, a physical security team works closely with these functions to protect people, assets, and customer data. The Cluster Operations Leader builds and leads high-performing teams across these functions. They manage daily operations while applying technical expertise to address emerging challenges. Their role requires both strategic oversight and the ability to dive deep into specific issues. As a vital member of the management team, the Cluster Operations Leader helps scale the world's largest cloud computing infrastructure. The position demands innovation to solve complex daily challenges and drive operational excellence. Success in this role requires strong technical knowledge on both engineering (electrical and mechanical) and compute systems, process optimization skills, and dedication to achieving world-class operational performance.

Requirements

  • 10+ years of relevant management experience in datacenter operations, facility engineering operations, information technology critical environment facilities, advanced high volume manufacturing, datacenter build-outs and scaling, or similar experience
  • Bachelor's or Master’s degree in Engineering, Computer Science or a related field, or relevant industry experience
  • Demonstrated track record in delivering complex analytical and quantitative projects through verbal and written communication
  • Proven ability to hire, develop and manage high-performing geographically distributed technical teams
  • Excellent written and verbal communications skills
  • Knowledge of mechanical, electrical, and controls systems for critical infrastructures
  • Expertise in one or more continuous improvement methodologies such as Lean or Six Sigma
  • Broad knowledge of information technology infrastructure domains such as compute server platforms, storage server platforms, server components, network devices, technologies and architectures, IT service delivery principles and best practices.

Responsibilities

  • Hire, manage, and develop the operations team, including compute and engineering operations managers, and their teams
  • Achieve organizational performance goals for safety, security, availability, scaling, efficiency, and cost
  • Plan and execute Infrastructure Operations for new Data Centers and Colocation expansions
  • Operate and maintain mechanical, electrical, and controls systems in Amazon Data Centers, including preventive maintenance, corrective maintenance, and change management
  • Manage Colocation Data Centers service providers to meet or exceed contracted performance SLAs
  • Lead safety, security, and availability incident response, management, and resolution
  • Improve operational processes, procedures, methods, and tools continuously

Benefits

  • health insurance (medical, dental, vision, prescription, Basic Life & AD&D insurance and option for Supplemental life plans, EAP, Mental Health Support, Medical Advice Line, Flexible Spending Accounts, Adoption and Surrogacy Reimbursement coverage)
  • 401(k) matching
  • paid time off
  • parental leave
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service