Walmart processes more transactions in a day than most companies handle in a year. When performance degrades or systems fail, the impact is immediate — measured in millions of dollars and hundreds of millions of customers. We're building the team that prevents that using agentic AI. As a Principal Engineer in Performance and Resiliency Engineering, you'll architect and lead the development of intelligent, self-healing systems: LLM-based agents that detect anomalies, reason across observability data, and trigger automated remediation — without waiting for a human in the loop. You'll operate at a scale most AI engineers never encounter: 10,500 stores, 240M weekly customers, and infrastructure that powers one of the world's largest retail ecosystems. This isn't a research role or a proof-of-concept environment. You'll own the technical strategy, set architectural direction, and ship to production — building agentic systems that directly impact Walmart's global reliability and business continuity. Building the right technology foundation for Infrastructure & Platforms is vital to success at Walmart's scale. Our team builds and maintains the foundational technologies that power the entire tech organization — data platforms, enterprise architecture, DevOps, cloud computing, and infrastructure. We ship to production weekly, run blameless postmortems, and treat chaos experiments as first-class engineering work. If you thrive in high-ownership environments where your architectural decisions have immediate, measurable impact, this is where you belong.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Principal