Manager, Site Reliability Engineering

David YurmanNew York, NY
65d$180,000 - $185,000

About The Position

The David Yurman IT Team is focused on being the market leader in luxury customer experience. We aim to provide an end-to-end true experience for customers and associates. To do this, we need individuals that can function as part of a team, collaborate and bring folks along on the journey to strive for excellence and quality. We are looking for the imaginative with a pragmatic tactical view on how to drive initiatives from conceptual design through operational excellence and effectively communicate the entire way. We are looking for a skilled and passionate SRE Manager to lead our Cloud Engineering team. The ideal candidate will have a deep understanding of infrastructure, scalability, and reliability in the context of large-scale, customer-facing applications. As an SRE Manager, you’ll play a key role in ensuring the availability, performance, and reliability of our retail ecosystem, driving the evolution of our infrastructure, and fostering a high-performing, resilient culture.

Requirements

  • Proficient in programming and scripting languages, particularly in PowerShell, Bash or Python
  • Extensive experience building all aspects of Cloud infrastructure; primarily AWS, with Oracle Cloud Infrastructure a plus.
  • AWS Certifications desirable.
  • Proficient with Infrastructure as Code, Terraform preferred.
  • Hashicorp Terraform Certification is desirable.
  • Strong understanding of networking, security, and database technologies
  • ~6 years of experience planning, designing, building, and implementing IT systems.
  • ~3 years must be in the direct supervision and management of major projects that involve providing professional support services and/or the integration, implementation and transition of large complex system and subsystem architectures.
  • Ability to work effectively in an agile focus environment, and work as a team cross functionally with the business and technology counterparts.
  • Ability to communicate clearly and effectively to business, stakeholders and wider audiences.
  • Self-motivated, willing to work in a 24/7 environment without close supervision.
  • Analytical and quality driven
  • Experience with monitoring and observability tools (e.g., PagerDuty, AWS CloudWatch, SolarWinds etc.)

Responsibilities

  • Serve as a true player/coach, comfortable with a balance of leadership and hands-on technical contributions
  • Oversee system reliability and availability, ensuring high up time for our applications and services.
  • Implement Infrastructure as Code (IaC) using Terraform for efficient infrastructure management.
  • Monitor system performance and respond to incidents, maintaining system health and security.
  • Developing and enforcing SRE best practices and processes
  • Driving automation to enhance operational efficiency
  • Promoting a culture of continuous learning and improvement
  • Collaborating with development teams to build scalable and resilient systems
  • Monitoring and establishing service level objectives (SLOs) and indicators (SLIs)
  • Conducting post-mortems to prevent future issues

Benefits

  • Competitive salary of $180,000 - $185,000
  • Hybrid work environment
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service