Lead Site Reliability Engineer

kate spade new york•North Bergen, NJ

69d

About The Position

We believe that difference sparks brilliance, so we welcome people and ideas from everywhere to join us in stretching what’s possible. At Tapestry, being true to yourself is core to who we are. When each of us brings our individuality to our collective ambition, our creativity is unleashed. This global house of brands – Coach and Kate Spade New York – was built by unconventional entrepreneurs and unexpected solutions, so when we say we believe in dreams, we mean we believe in making them happen. We’re always on a journey to becoming our best, but you can count on this: Here, your voice is valued, your ambitions are supported, and your work is recognized. A member of the Tapestry family, we are part of a global house of brands that has unwavering optimism and is committed to being innovative and wholly inclusive. Visit Our People page to learn more about Tapestry's commitment to equity, inclusion, and diversity. Work with SRE team on projects for users and be directly responsible for uptime. Ensure system availability (implement SLO based on SLA and SLI), reliability, scalability, and performance of sites across all Tapestry brands. Ensure reliable and secure production and pre-production environments. Work with team to develop tools, automation, processes and metrics to ensure maximum reliability, uptime, and availability for our customers. Implement monitoring solutions to improve overall system monitoring and alerting. Build automation and reduce toil. Be a champion for SRE practices across the wider engineering organization and work with other engineering partners to grow our culture of automation and reliability. Provide subject matter expertise for functional and technical aspects of the Salesforce Commerce Cloud platform.

Requirements

Requires a Bachelor’s degree in Computer Engineering, or related field, or foreign equivalent; must have 5 years of experience in the job offered or related position
Experience in Site Reliability Engineering and Production Support
Experience in building or maintaining highly available systems at scale
Experience in one or more technologies: Java, Node or Python
Experience in deploying, supporting and monitoring new/existing services, platforms and applications stacks
Experience in building or maintaining monitoring platform using tools like Splunk, AppDynamics, Blue Triangle, or Quantum Metric
Experience in working with RUM and Synthetic monitoring tools
Experience in working with one or more Cloud platforms: GCP, AWS or Azure
Experience with project management tools like Jira or Confluence

Responsibilities

Work with SRE team on projects for users and be directly responsible for uptime.
Ensure system availability (implement SLO based on SLA and SLI), reliability, scalability, and performance of sites across all Tapestry brands.
Ensure reliable and secure production and pre-production environments.
Work with team to develop tools, automation, processes and metrics to ensure maximum reliability, uptime, and availability for our customers.
Implement monitoring solutions to improve overall system monitoring and alerting.
Build automation and reduce toil.
Be a champion for SRE practices across the wider engineering organization and work with other engineering partners to grow our culture of automation and reliability.
Provide subject matter expertise for functional and technical aspects of the Salesforce Commerce Cloud platform.