Manager, DevOps (AI)

PetcoNatick, MA
1dRemote

About The Position

Create a healthier, brighter future for pets, pet parents and people! If you want to make a real difference, create an exciting career path, feel welcome to be your whole self and nurture your wellbeing, Petco is the place for you. Our core values capture that spirit as we work to improve lives by doing what’s right for pets, people and our planet. We love all pets like our own We’re the future of the pet industry We’re here to improve lives We drive outstanding results together We’re welcome as we are Petco is a category-defining health and wellness company focused on improving the lives of pets, pet parents and Petco partners. We are 29,000 strong and operate 1,500+ pet care centers in the U.S., Mexico and Puerto Rico, including 250+ Vetco Total Care hospitals, hundreds of preventive care clinics and eight distribution centers. We’re focused on purpose-driven work, and strongly believe what’s good for pets, people and our planet is good for Petco. NAME OF EMPLOYER: Petco Animal Supplies Stores, Inc. POSITION TITLE: Manager, DevOps (AI) POSITION LOCATION: 654 Richland Hills Dr., San Antonio, TX 78245 - 100% remote position HOURS: Full Time/40 hours JOB DUTIES: Responsible for leading and scaling technology functions that supports AI/ML workloads in a cloud-native environment. This role involves overseeing design, implementation, and maintenance of automated deployment pipelines, cloud infrastructure, and monitoring systems to ensure high availability, scalability, and security of AI-driven applications. Collaborate closely with data scientists, machine learning engineers, and software developers to streamline model deployment, accelerate experimentation, and optimize operational practices. Drive cloud cost optimization strategies, including effective resource allocation, right-sizing, and leveraging cloud-native tools to reduce operational expenses while maintaining performance and reliability. Serve as a technical leader with deep knowledge of cloud platforms (AWS, Azure, or GCP), infrastructure-as-code, containerization, and AI workflows. Mentor team and foster culture of continuous improvement, reliability, and efficiency.

Requirements

  • Bachelor’s degree or equivalent in Computer Science, or related field of study and 7 years of progressive experience in Development Operations. Employer will accept a Master’s degree or equivalent in Computer Science or related field of study and 3 years of post-baccalaureate experience in Development Operations in lieu of a Bachelor’s degree or equivalent and 7 years of progressive experience.
  • 7 years (3 years with Master’s) of experience in Cloud operations.
  • 4 years (2 years with Master’s) of experience in a supervisory capacity or serving as a direct lead.
  • 3 years of experience assisting with architecture design and providing support for development teams’ management, migrations, and deployments from a data center to public cloud Serverless Lambda, Cloud Databases, and EKS.
  • 3 years of experience using GitLab pipelines, terraform, vault and helm to manage and deploy public cloud infrastructure, git merges, pull requests and CI/CD processes.
  • 3 years of experience working in SOX audited environment or similar governmental body environment directly interfacing with auditors to review and respond to audits and findings.
  • 3 years of experience implementing cloud security frameworks.
  • 3 years of experience maintaining large (at least $5M) annual cloud budget with experience in cost control and cost saving measures, including savings plans, reserved instances, Spot marketplace, and preferred pricing models.
  • 2 years of experience with cloud networking, routing, load balancing, CIDR Blocks, Security Groups, Access Lists and policies.
  • managing large and complex cloud infrastructure migration projects.
  • technical presentations to senior management with varying technical backgrounds on operational efficiency and innovation projects.

Responsibilities

  • Responsible for leading and scaling technology functions that supports AI/ML workloads in a cloud-native environment.
  • Overseeing design, implementation, and maintenance of automated deployment pipelines, cloud infrastructure, and monitoring systems to ensure high availability, scalability, and security of AI-driven applications.
  • Collaborate closely with data scientists, machine learning engineers, and software developers to streamline model deployment, accelerate experimentation, and optimize operational practices.
  • Drive cloud cost optimization strategies, including effective resource allocation, right-sizing, and leveraging cloud-native tools to reduce operational expenses while maintaining performance and reliability.
  • Serve as a technical leader with deep knowledge of cloud platforms (AWS, Azure, or GCP), infrastructure-as-code, containerization, and AI workflows.
  • Mentor team and foster culture of continuous improvement, reliability, and efficiency.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service