(USA) Senior, Software Engineer

WalmartSunnyvale, CA
$117,000 - $234,000Onsite

About The Position

The Search PTE - DevOps Team at Walmart processes billions of queries for millions of products on Walmart sites and apps worldwide, handling user queries and product category browsing. This involves mining structured and semi-structured data from various sources at an unprecedented scale, working on big data problems, and utilizing cutting-edge relevance algorithms from information retrieval, machine learning, and AI-powered ranking to deliver a high-availability, low-latency service that directly impacts business metrics. Being part of this team provides deep insight into the full lifecycle of a product, from content acquisition to being sold on Walmart.com. As a Senior Software Engineer in DevOps & AI Platform, the role requires supporting all systems and services to ensure high availability and reliability, while embracing AI-augmented workflows to accelerate engineering velocity. This position involves close collaboration with developers, AI/ML engineers, and platform teams to support new application features, AI model deployments, and service launches. The engineer will design, build, and operate tools for developing, scaling, and monitoring cutting-edge technology, including GenAI and LLMOps pipelines, and must be able to triage complex technical issues in collaboration with various engineering and platform teams. The ideal candidate is passionate about five 9’s reliability and excited about the intersection of AI and platform engineering, with expertise in continuous integration and delivery pipelines, containerized infrastructure, and AI-assisted development practices. This role plays a critical part in all search application and AI model release cycles, working closely with Engineering, QE, and DevOps.

Requirements

  • Bachelor’s or Master’s Degree in Computer Science, Engineering, or related field
  • 5+ years of experience building scalable eCommerce applications or distributed backend services
  • 3+ years of industry experience in application releases, CI/CD pipelines, and distributed system testing
  • Strong expertise in containerization and orchestration using Kubernetes (including multi-cluster and GPU-node management)
  • 2+ years of programming experience in Python, Go, Java, and Shell scripting, with exposure to REST and gRPC API frameworks
  • Experience with modern CI/CD platforms (e.g., Concord, GitHub Actions, Looper) and GitOps workflows (e.g., ArgoCD, Flux)
  • Working knowledge of AI/ML workflows: model serving, inference optimization, or LLM deployment pipelines
  • Familiarity with observability stacks: OpenTelemetry, distributed tracing, log aggregation (e.g., Splunk, OpenObserve), and AI-assisted anomaly detection
  • Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and 3 years’ experience in software engineering or related area.
  • Option 2: 5 years’ experience in software engineering or related area.

Nice To Haves

  • Experience with LLMOps and GenAI platforms: prompt engineering, RAG pipelines, vector databases (e.g., Pinecone, Weaviate, Elasticsearch KNN), and LLM evaluation frameworks
  • Hands-on experience with AI coding assistants (e.g., Wibey, GitHub Copilot) and AI-augmented DevOps tooling
  • Proficiency with WCNP (Walmart Cloud Native Platform) and cloud-native infrastructure on GCP or Azure
  • Knowledge of eBPF-based observability tools (e.g., Cilium, Pixie) and advanced networking concepts (VIP, TCP, Envoy/Istio service mesh)
  • Experience with GPU infrastructure management for AI workloads (CUDA, NVIDIA device plugins for Kubernetes)
  • Familiarity with MLflow, Kubeflow, Ray, or similar MLOps platforms for experiment tracking and model lifecycle management
  • Experience with performance and load testing tools (e.g., Gatling, k6, Locust) to measure server and client-side metrics
  • Knowledge of AI safety and responsible AI practices in production environments (guardrails, content filtering, bias monitoring)
  • Contributions to open-source DevOps, AI/ML, or platform engineering projects are a strong plus
  • Master’s degree in Computer Science, Computer Engineering, Computer Information Systems, Software Engineering, or related area and 1 year's experience in software engineering or related area.
  • Knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly.
  • Knowledge of accessibility best practices and joining us as we continue to create accessible products and services following Walmart’s accessibility standards and guidelines for supporting an inclusive culture.

Responsibilities

  • Build, manage, and evolve QE & Release Automation frameworks, incorporating AI-assisted test generation and self-healing test capabilities
  • Build and support Kubernetes-based containerization in production, including GPU-backed workloads for AI/ML inference
  • Lead independently the investigation and resolution of high-impact search system and AI service incidents
  • Build, manage, and support comprehensive monitoring and observability for applications and AI model performance (drift, latency, accuracy)
  • Maintain and improve automation pipelines supporting application build, release, and AI model deployment cycles (CI/CD + MLOps/LLMOps)
  • Integrate AI coding assistants and GenAI tooling (e.g., Wibey, GitHub Copilot) into engineering workflows to accelerate development
  • Design and implement AI-powered observability solutions using intelligent alerting, anomaly detection, and predictive incident management
  • Collaborate with AI/ML teams to operationalize LLM-based features within search, including prompt pipeline management and vector search infrastructure
  • Drive execution and lead medium- to large-scale projects from Dev to Ops, including AI/ML platform initiatives
  • Analyze, design, and build frameworks using cutting-edge technology and AI tools to fulfill Operational Excellence
  • Lead and independently handle high-impact, critical search system and AI service incidents
  • Improve, optimize, and identify opportunities within the software development and AI deployment lifecycle (SDLC + MLOps)
  • Provide engineering and QE teams with architectural guidance on solutions, automation frameworks, and AI integration patterns
  • Work with product and engineering teams to review new functional and AI-driven requirements; develop comprehensive test plans and automate test cases — including AI model validation
  • Perform quality assurance for large-scale eCommerce backend search services and AI-powered features
  • Write programs and scripts to automate testing and validation of search backend services and LLM/AI inference pipelines
  • Expertise in WCNP, Concord, Looper, Python, Golang, and Java — with hands-on experience in AI/ML tooling, LLMOps, and GenAI platforms

Benefits

  • Incentive awards for your performance
  • 401(k) match
  • Stock purchase plan
  • Paid maternity and parental leave
  • PTO (including sick leave)
  • Multiple health plans (medical, vision, and dental coverage)
  • Performance-based bonus awards
  • Company-paid life insurance
  • Family care leave
  • Bereavement
  • Jury duty
  • Voting
  • Short-term and long-term disability
  • Company discounts
  • Military Leave Pay
  • Adoption and surrogacy expense reimbursement
  • Live Better U (Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities, covering high school completion to bachelor's degrees, English Language Learning and short-form certificates, including tuition, books, and fees)
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service