(USA) Principal, Software Engineer

WalmartSunnyvale, CA
1d

About The Position

Position Summary... Serve as a technical thought leader driving the next phase of Walmart’s Performance and Resiliency Engineering. Architect, build, and scale intelligent agentic AI/ML systems that proactively optimize speed, reliability, and business continuity across Walmart’s global platforms. Operate at the intersection of engineering, data science, and business—translating visionary ideas into actionable architecture and tangible solutions. About Team: Building the right technology foundation for Infrastructure & platforms is vital to success at the scale of Walmart. Our team builds and maintains the foundational technologies that support the tech organization. Included in this are data platforms, enterprise architecture, DevOps, cloud computing, and infrastructure. All of these products and services are supported by scalable and powerful infrastructure, ensuring a secure and seamless employee and customer experience across stores, digital channels, and distribution centers. What you'll do... Key Responsibilities

Requirements

  • Proven experience with LLMs, GenAI, RAG, agentic frameworks, and embedding-based workflows.
  • Deep expertise in distributed systems, cloud-native architectures, and scalable microservices (GCP, Azure, Kubernetes, Docker).
  • Strong programming skills: Python, Java, SQL; hands-on with ML frameworks (PyTorch, TensorFlow, Hugging Face Transformers).
  • Experience with performance engineering, chaos engineering, and building resilient, fault-tolerant systems.
  • Demonstrated success in technical leadership, mentoring, and cross-functional collaboration.
  • Strong experimentation background (A/B testing, causal inference) and MLOps (CI/CD, monitoring, drift detection).
  • Excellent communication skills; able to bridge technical and non-technical stakeholders.
  • Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and 5 years’ experience in software engineering or related area.
  • Option 2: 7 years’ experience in software engineering or related area.

Nice To Haves

  • Master’s degree in computer science, computer engineering, computer information systems, software engineering, or related area and 3 years' experience in software engineering or related area.
  • We value candidates with a background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly.
  • The ideal candidate would have knowledge of accessibility best practices and join us as we continue to create accessible products and services following Walmart’s accessibility standards and guidelines for supporting an inclusive culture.

Responsibilities

  • AI/ML & Agentic System Leadership Design, fine-tune, and deploy Generative AI models (including LLMs) and agentic frameworks (e.g., RAG, Crew AI) for performance monitoring, anomaly detection, and automated remediation.
  • Develop and optimize LLM-based agents for multi-step reasoning, knowledge grounding, and decision-making.
  • Architect scalable, distributed AI systems with a focus on performance, fault tolerance, and disaster recovery.
  • Integrate external data sources (vector databases, observability stacks) to build dynamic, context-aware, and self-healing systems.
  • Lead the development of LLM evaluation pipelines (factuality, consistency, relevance) and implement safety guardrails.
  • Performance Engineering Architect and implement AI/ML-driven solutions for continuous performance monitoring, automated tuning, and predictive scaling.
  • Establish and enforce performance benchmarks, SLAs, and SLOs; integrate performance testing into CI/CD pipelines.
  • Leverage advanced observability tools (Grafana, ELK, Splunk, Prometheus) and distributed tracing for actionable insights.
  • Optimize LLM inference (prompt caching, quantization, retrieval filtering) and system throughput.
  • Resiliency & Chaos Engineering Champion resilient architectures that maintain business continuity during failures or spikes.
  • Lead chaos engineering initiatives: design and execute controlled failure scenarios, analyze impact, and drive improvements.
  • Leverage AI/ML for predictive failure detection, drift monitoring, and autonomous remediation.
  • Develop and maintain playbooks for critical/non-critical dependency failures and disaster recovery.
  • Technical Leadership & Collaboration Guide engineering teams on best practices, technical design, and architectural decisions for AI/ML and agentic systems.
  • Collaborate with data scientists, ML engineers, SRE, and product teams to operationalize AI/ML models and integrate them into production.
  • Mentor engineers, foster a culture of continuous learning, and contribute to internal platform standards and engineering playbooks.
  • Drive experimentation (A/B testing, multi-armed bandits, causal inference) and champion innovation.
  • Product Integration & Delivery Partner with cross-functional teams to deliver end-to-end, cloud-native solutions (GCP, Azure, Kubernetes, Docker).
  • Shape the architecture and roadmap for AI-powered performance and resiliency systems.
  • Ensure high standards for quality, security, and performance through rigorous design and code reviews.

Benefits

  • Beyond our great compensation package, you can receive incentive awards for your performance.
  • Other great perks include 401(k) match, stock purchase plan, paid maternity and parental leave, PTO, multiple health plans, and much more.
  • At Walmart, we offer competitive pay as well as performance-based bonus awards and other great benefits for a happier mind, body, and wallet.
  • Health benefits include medical, vision and dental coverage.
  • Financial benefits include 401(k), stock purchase and company-paid life insurance.
  • Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting.
  • Other benefits include short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more.
  • You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes.
  • Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities.
  • Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates.
  • Tuition, books, and fees are completely paid for by Walmart.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service