About The Position

Walmart Global Tech is redefining enterprise resilience and operational continuity at an unprecedented scale. We are seeking a Senior Director, Agentic AI to lead the development and deployment of intelligent, automated resilience systems that ensure uninterrupted operations across our global omnichannel ecosystem. From safeguarding multi-billion-dollar revenue streams to protecting essential services for millions of customers, this role will be critical in advancing Walmart's mission to help people save money and live better-through bulletproof operational resilience and proactive disaster recovery capabilities. This role will be critical in ensuring zero downtime for mission-critical services, advancing AI-driven recovery automation, and redefining how associates, customers, and systems interact across Walmart's omnichannel ecosystem. From One Click DR Certification to goal-driven AI agents for predictive recovery and operational continuity, you will shape the backbone of Walmart's global resilience strategy.

Requirements

  • 16+ years of experience in Site Reliability Engineering, Production Engineering, and Infrastructure Reliability, with extensive leadership across Fortune 50 enterprises and global-scale platforms.
  • Proven track record of building and leading enterprise-scale programs-including automation of disaster recovery certification, resiliency-as-a-service platforms, and large-scale incident management
  • Deep understanding of Hybrid cloud architectures (private + public) using OpenStack, GCP, Azure and networking expertise
  • Hands-on expertise with Agentic AI and resilience automation, including design of AI-powered reasoning systems for predictive risk detection, automated failover orchestration, and policy-driven continuity validation.
  • Deep expertise in: o Building enterprise platforms and certification frameworkso Designing agentic AI systems for reasoning, prediction, and multi-step task execution o Multi-region recovery, data synchronization, and simulation exercises o Recovery Time Objective (RTO) and Recovery Point Objective (RPO) governance
  • Recognized leader in SRE and Resilience Engineering, driving critical business continuity programs and Site Reliability Teams
  • Demonstrated impact at Fortune 50 scale, including protecting multi-billion-dollar eCommerce and retail operations during high-stakes events
  • Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and7 years' experience in site reliability engineering, site and system administration, infrastructure management, or related area.
  • Option 2: 9 years' experience in site reliability engineering, site and system administration, infrastructure management, or related area.
  • 4 years' supervisory experience.

Nice To Haves

  • Retail/eCommerce Expertise: Experience with high-volume transaction systems, point- of-sale infrastructure, inventory management, and fulfillment operations
  • SRE SWAT Team Development: Building and operationalizing elite rapid-response SRE teams trained for large-scale incident command, playbook execution, and real-time mitigation.
  • Global Operations: Experience managing disaster recovery across multiple geographic regions and regulatory environments
  • Experience managing programs across hybrid cloud and distributed architectures.
  • Knowledge of retail, supply chain, or eCommerce AI applications a strong plus.
  • Master's degree in site reliability engineering, site and system administration, infrastructure management, or related area and 5 years' experience in site reliability engineering, site and system administration, infrastructure management, or related area.
  • SRE certification (for example, IBM Cloud Site Reliability Engineer).
  • We value candidates with a background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly. The ideal candidate would have knowledge of accessibility best practices and join us as we continue to create accessible products and services following Walmart's accessibility standards and guidelines for supporting an inclusive culture.

Responsibilities

  • Strategic Vision & Leadership
  • Technology Execution
  • Build and Lead SRE SWAT Team
  • Business Continuity
  • Disaster Recovery
  • Team Building & Talent Development
  • Enterprise Risk Management & Governance
  • External & Internal Influence
  • Industry Leadership & Innovation

Benefits

  • At Walmart, we offer competitive pay as well as performance-based bonus awards and other great benefits for a happier mind, body, and wallet. Health benefits include medical, vision and dental coverage. Financial benefits include 401(k), stock purchase and company-paid life insurance. Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting. Other benefits include short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more.
  • You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes. The amount you receive depends on your job classification and length of employment. It will meet or exceed the requirements of paid sick leave laws, where applicable.
  • Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities. Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates. Tuition, books, and fees are completely paid for by Walmart.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service