SHEIN-posted 3 months ago
$230,239 - $235,239/Yr
Full-time • Senior

The Senior Manager, Site Reliability Engineering is responsible for maintaining a 24x7 production environment with a high level of service availability. This role involves performing quality reviews, managing operational issues, and defining a culture of operational excellence across infrastructure and engineering through SLAs, processes, monitoring, etc. The manager will provide leadership and direction to engineers responsible for break-fix, uptime, and reliability for core services, distribution, network elements, and related interfaces. Additionally, the role includes people-care management for team members, which encompasses hiring, setting and monitoring annual performance plans, coaching, and career development. The manager ensures that proper knowledge and career development tools are in place to support ongoing team member development. Setting clear expectations and creating a positive work environment based on accountability, in collaboration with other engineering teams, is also a key responsibility. The manager will work with other engineering managers to grow a culture of automation and reliability, continuously improve the 24/7 on-call and incident management process, and lead blameless post-mortems.

  • Maintain 24x7 production environment with a high level of service availability.
  • Perform quality reviews and manage operational issues.
  • Define and drive a culture of operational excellence across infrastructure and engineering.
  • Provide leadership and direction to engineers responsible for break-fix, uptime, and reliability.
  • Manage people-care for team members including hiring, performance plans, coaching, and career development.
  • Set clear expectations and create a positive work environment based on accountability.
  • Collaborate with other engineering teams.
  • Grow a culture of automation and reliability.
  • Continuously improve the 24/7 on-call and incident management process.
  • Lead blameless post-mortems.
  • Master’s degree in Electrical Engineering, Electronic and Information Engineering, Computer Science, or a related field.
  • 3 years of progressively responsible postbaccalaureate experience in job offered or any engineering related job titles.
  • Experience in Linux System Administration including Network and DNS Management, Performance tuning, resource optimization, and security practices.
  • Experience with Cloud Technology including AWS and Azure for scalable computing and globally distributed data management.
  • Experience architecting and maintaining robust data pipelines using serverless computing platforms such as AWS Lambda and Azure Functions.
  • Experience with object storage services like AWS S3 and Azure Blob Storage.
  • Experience designing and managing cloud network architectures, including Virtual Private Clouds (VPC), subnets, load balancing, and DNS configurations.
  • Experience with site reliability engineering methodologies.
  • Experience with the configuration and maintenance of EMR, Hadoop, Kafka, and Elasticsearch clusters.
  • Experience with Data Warehousing and Real-Time Processing frameworks.
  • Experience with Hadoop Management Platforms including HDP stack.
  • Telecommuting permitted.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service