Senior Site Reliability Engineering Manager

SHEIN

139d•$230,239 - $235,239

About The Position

The Senior Manager, Site Reliability Engineering is responsible for maintaining a 24x7 production environment with a high level of service availability. This role involves performing quality reviews, managing operational issues, and defining a culture of operational excellence across infrastructure and engineering through SLAs, processes, monitoring, etc. The manager will provide leadership and direction to engineers responsible for break-fix, uptime, and reliability for core services, distribution, network elements, and related interfaces. Additionally, the role includes people-care management for team members, which encompasses hiring, setting and monitoring annual performance plans, coaching, and career development. The manager ensures that proper knowledge and career development tools are in place to support ongoing team member development. Setting clear expectations and creating a positive work environment based on accountability, in collaboration with other engineering teams, is also a key responsibility. The manager will work with other engineering managers to grow a culture of automation and reliability, continuously improve the 24/7 on-call and incident management process, and lead blameless post-mortems.

Requirements

Master’s degree in Electrical Engineering, Electronic and Information Engineering, Computer Science, or a related field.
3 years of progressively responsible postbaccalaureate experience in job offered or any engineering related job titles.
Experience in Linux System Administration including Network and DNS Management, Performance tuning, resource optimization, and security practices.
Experience with Cloud Technology including AWS and Azure for scalable computing and globally distributed data management.
Experience architecting and maintaining robust data pipelines using serverless computing platforms such as AWS Lambda and Azure Functions.
Experience with object storage services like AWS S3 and Azure Blob Storage.
Experience designing and managing cloud network architectures, including Virtual Private Clouds (VPC), subnets, load balancing, and DNS configurations.
Experience with site reliability engineering methodologies.
Experience with the configuration and maintenance of EMR, Hadoop, Kafka, and Elasticsearch clusters.
Experience with Data Warehousing and Real-Time Processing frameworks.
Experience with Hadoop Management Platforms including HDP stack.

Responsibilities

Maintain 24x7 production environment with a high level of service availability.
Perform quality reviews and manage operational issues.
Define and drive a culture of operational excellence across infrastructure and engineering.
Provide leadership and direction to engineers responsible for break-fix, uptime, and reliability.
Manage people-care for team members including hiring, performance plans, coaching, and career development.
Set clear expectations and create a positive work environment based on accountability.
Collaborate with other engineering teams.
Grow a culture of automation and reliability.
Continuously improve the 24/7 on-call and incident management process.
Lead blameless post-mortems.

Benefits

Telecommuting permitted.

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Senior

Education Level

Master's degree

Senior Site Reliability Engineering Manager

About The Position

Requirements

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company