About The Position

PetSmart is at a key inflection point on it’s Digital journey. We are accelerating our Digital transformation roadmap to ensure we keep delighting our Pet Parents as we embark on new change journeys such as, sustainable AI adoption throughout the organization in a holistic manner, that fast-tracks our success in the industry. As part of this effort, we are looking for an experienced hands-on tehcnical Site Reliability Engineering (SRE) leader, who is excited by this opportunity to chart and enact our vision for the future! In this role, you will be responsible for maturing a nascent operations capability, by applying software engineering principles to IT operations, automating manual tasks to build and maintain a highly available, reliable and scable set of our customer-facing Digital Properties (PetSmart.com, Android and iOS mobile apps, Marketing Technology and supporting Services). You will have the opportunity to bridge the gap between software development and operations domains, to improve system availability, minimize outages and ensure faster recovery, by applying system observability, problem analysis, automation and best practices. You will lead incident response, change management, capacity planning, along with improvements in developer tooling and automation initiatives. You will manage department success by setting clear service-level indicators (SLIs), objectives (SLOs) and agreements (SLAs), and balance system reliability with innovation, improved collaboration and supporting modern, complex infrastructure. As a SRE Manager within PetSmart Technology organization, you will be based in Phoenix, Arizona.

Requirements

  • At least 10+ years of prior demonstrated experience in a Site Reliability Engineering, DevOps, or an Infrastructure-focused role
  • 5+ years of experience leading and managing high performance SRE teams
  • Bachelor or Masters or equivalent experience in Computer Science or other related field
  • Proven track record in leading sophisticated SRE projects, enterprise services at a large scale
  • Strong analytical, troubleshooting and problem solving skills
  • Have deep technical knowledge on distributed systems and cloud computing, modern web services architectures, security platforms and can quickly understand and respond to peer teams' needs.
  • Hands on experience with monitoring and data analysis tools (e.g., Prometheus, Splunk, Grafana, Cloudwatch)
  • Good fundamentals on Release Management & continuous Integration
  • Ability to communicate with large cross-functional teams about various engineering topics such as system architecture, detailed design, APIs, project schedules etc.
  • Ability to make right trade-off choices when dealing with functional complexity, conflicting priorities and aggressive schedules
  • Represent the team and remove hurdles to enable each team member to operate at the highest level of efficiency and productivity
  • Ability to hire, mentor and manage the performance of a mid to large team
  • Ability to connect with senior executives and business stakeholders
  • A learning attitude to continuously improve self, team and the organisation
  • Ability to work under pressure and manage difficult situations in a fast-paced work environment

Responsibilities

  • Building, developing and retaining a high-performing team of software engineers, systems analysts, vendor partners and create an environment where they can thrive and succeed by providing technical guidance to draw out their best work
  • Ensuring quality in every deliverable, creative thinking, strong problem solving, and the ability to collaborate with other cross-functional teams, in a fast paced environment
  • Drive major incident management to restore order and run structured blameless RCAs to learn from these issues and raise the performance bar
  • Innovate and find opportunities and drive automation efforts across various platform and applications. Actively participate in architectural and functional design, implementation and troubleshooting sessions. Spearhead designing and implementing comprehensive monitoring for applications, integrations and anomalies, integrate systems into CI/CD pipelines and systems performance measurement.
  • Act as the primary point of contact for eCommerece and MarTech Systems availability and reliability requirements
  • Collaborate with customer service teams to ensure seamless customer services, high customer satisfaction, and resolve issues quickly.
  • Manage our managed service partner relationships, ensuring alignment with internal operational standards, enforcing SLA compliance, and executing strategic plans to transition capabilities in-house when appropriate.
  • Hold vendors accountable for performance and service delivery, ensuring adherence to contractual obligations and the provision of high-quality, SLA-compliant support.

Benefits

  • Pet-friendly environment, bring your pets to work and enjoy the on-site dog park!
  • On-Site Events & Adoptions, enjoy community-building opportunities, including pet adoption days, seasonal celebrations, family events, art events, & holiday festivals
  • “Top Dog” gym with equipment, fitness classes, massage therapists, personal trainers, and wellness spaces
  • “Sit & Stay” Café serving fresh breakfast and lunch options, snacks, & more
  • “Lil Paws” NAEYC-accredited onsite childcare facility providing high-quality early education
  • Paid Volunteer Opportunities to spend time doing good for causes close to heart
  • Print Center and Business Services, Dry Cleaning, Mother's Rooms, Sustainable Infrastructure & more
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service