Senior, Software Engineer - AI Systems

Walmart•Bentonville, AR

1d•Onsite

About The Position

We’re seeking a Software Engineer to design and build AI-first systems with a focus on agentic AI, high performance data/compute frameworks, and scalable, production-grade services. You’ll work across model-driven features and platform layers—integrating LLMs/agents, orchestrating pipelines with Ray, accelerating data science workloads with RAPIDS, and delivering robust APIs and services that power high-impact AI applications at scale. The ideal candidate blends strong software engineering fundamentals with practical ML systems exposure and a passion for performance, reliability, and developer experience.

Requirements

Bachelor's/Master's in CS, Engineering, or equivalent industry experience.
4+ years building production backend or platform services (preferably in AI/ML contexts).
Proficiency in: Languages: Python (primary), plus one of Go/Java/C++ for performance services.
Proficiency in: Distributed frameworks: Ray, Spark, or Dask.
Proficiency in: Accelerated compute: RAPIDS (cuDF/cuML/cuGraph) and GPU-aware programming concepts (streams, memory).
Proficiency in: Service frameworks: FastAPI/Flask (Python), K8s (Kubernetes) and containerization (Docker).
Strong foundations in data structures/algorithms, concurrency, networking, and systems design.
Option 1: Bachelor's degree in computer science, computer engineering, computer information systems, software engineering, or related area and 3 years’ experience in software engineering or related area.
Option 2: 5 years’ experience in software engineering or related area.

Nice To Haves

Production experience with agent frameworks (e.g., LangGraph-style planners, tool-use patterns, retrieval and memory components).
Experience with vector databases (e.g., FAISS, Milvus, pgvector, Pinecone) and feature stores.
Familiarity with LLM and embedding services, prompt/tooling patterns, and evaluation harnesses.
Hands-on with Kubernetes, autoscaling (HPA/KEDA), and GPU scheduling/operators.
Performance profiling: PyTorch profiler, Nsight, line-profiler, Ray dashboard.
Experience with vLLM, Triton Inference Server, ONNX Runtime, or TensorRT for high‑throughput inference.
Pragmatic problem solver with a bias for measurable outcomes (latency, throughput, reliability).
Excellent communicator able to translate between research goals and production constraints.
Drives clarity in ambiguous problem spaces; mentors others and uplifts engineering standards.
Background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly.
Knowledge of accessibility best practices and join us as we continue to create accessible products and services following Walmart’s accessibility standards and guidelines for supporting an inclusive culture.
Master’s degree in Computer Science, Computer Engineering, Computer Information Systems, Software Engineering, or related area and 1 year's experience in software engineering or related area.

Responsibilities

Build agentic AI services (planning, tool use, retrieval, feedback loops) and integrate them with internal systems and APIs.
Implement orchestration, memory, tooling, evaluation, and guardrails for agentic workflows.
Collaborate with DS/MLE partners to productionize models (LLMs, GNNs, embedding services) behind stable APIs and SDKs.
Develop GPU‑accelerated pipelines using RAPIDS (cuDF/cuML/cuGraph) and optimize end‑to‑end performance.
Use Ray (or similar) for distributed compute, batch/stream processing, and scalable workflow orchestration.
Profile and optimize bottlenecks across CPU/GPU, memory, and I/O layers; implement caching, vectorization, and async patterns.
Design and maintain reliable microservices for training/inference, vector indexing, and real-time decisioning.
Implement observability (tracing/metrics/logging), fault tolerance, auto-scaling, and cost-aware execution.
Create internal SDKs/CLIs to streamline developer workflows, testing, and reproducibility.
Establish CI/CD for AI services (unit/integration/e2e tests, canaries, blue/green, rollback).
Integrate with feature stores, vector databases, artifact registries, and model catalogs.
Enforce security, privacy, and compliance (data minimization, PII handling, governance, auditability).
Partner with product, platform, and DS/MLE teams to align requirements, SLAs, and success metrics.
Document systems thoroughly; contribute to design reviews and engineering best practices.
Mentor peers on AI systems patterns, distributed compute, and performance engineering.

Benefits

Incentive awards for performance
401(k) match
Stock purchase plan
Paid maternity and parental leave
PTO
Multiple health plans
Competitive pay
Performance-based bonus awards
Company-paid life insurance
Family care leave
Bereavement
Jury duty
Voting leave
Short-term and long-term disability
Company discounts
Military Leave Pay
Adoption and surrogacy expense reimbursement
PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes
Walmart-paid education benefit program (Live better U) for full-time and part-time associates