Manager, Site Reliability Engineering

David Yurman•New York, NY

115d•$180,000 - $185,000

About The Position

The David Yurman IT Team is focused on being the market leader in luxury customer experience. We aim to provide an end-to-end true experience for customers and associates. To do this, we need individuals that can function as part of a team, collaborate and bring folks along on the journey to strive for excellence and quality. We are looking for the imaginative with a pragmatic tactical view on how to drive initiatives from conceptual design through operational excellence and effectively communicate the entire way. We are looking for a skilled and passionate SRE Manager to lead our Cloud Engineering team. The ideal candidate will have a deep understanding of infrastructure, scalability, and reliability in the context of large-scale, customer-facing applications. As an SRE Manager, you’ll play a key role in ensuring the availability, performance, and reliability of our retail ecosystem, driving the evolution of our infrastructure, and fostering a high-performing, resilient culture.

Requirements

Proficient in programming and scripting languages, particularly in PowerShell, Bash or Python
Extensive experience building all aspects of Cloud infrastructure; primarily AWS, with Oracle Cloud Infrastructure a plus.
AWS Certifications desirable.
Proficient with Infrastructure as Code, Terraform preferred.
Hashicorp Terraform Certification is desirable.
Strong understanding of networking, security, and database technologies
~6 years of experience planning, designing, building, and implementing IT systems.
~3 years must be in the direct supervision and management of major projects that involve providing professional support services and/or the integration, implementation and transition of large complex system and subsystem architectures.
Ability to work effectively in an agile focus environment, and work as a team cross functionally with the business and technology counterparts.
Ability to communicate clearly and effectively to business, stakeholders and wider audiences.
Self-motivated, willing to work in a 24/7 environment without close supervision.
Analytical and quality driven
Experience with monitoring and observability tools (e.g., PagerDuty, AWS CloudWatch, SolarWinds etc.)

Responsibilities

Serve as a true player/coach, comfortable with a balance of leadership and hands-on technical contributions
Oversee system reliability and availability, ensuring high up time for our applications and services.
Implement Infrastructure as Code (IaC) using Terraform for efficient infrastructure management.
Monitor system performance and respond to incidents, maintaining system health and security.
Developing and enforcing SRE best practices and processes
Driving automation to enhance operational efficiency
Promoting a culture of continuous learning and improvement
Collaborating with development teams to build scalable and resilient systems
Monitoring and establishing service level objectives (SLOs) and indicators (SLIs)
Conducting post-mortems to prevent future issues

Benefits

Competitive salary of $180,000 - $185,000
Hybrid work environment

Stand Out From the Crowd

Upload your resume and get instant feedback on how well it matches this job.

Upload and Match Resume

What This Job Offers

Job Type

Full-time

Career Level

Manager

Education Level

Bachelor's degree

Number of Employees

5,001-10,000 employees

Manager, Site Reliability Engineering

About The Position

Requirements

Responsibilities

Benefits

What This Job Offers

Job Search Resources

Tools

Career Hubs

Guides

Company