Position Summary... What you'll do... The Problem Management team under GTP is responsible for implementing Correction of Error (COE) across Walmart. We collaborate with and train engineering teams across the U.S., Sam’s Club, and International markets on COE best practices. We are developing an Agentic AI solution intended for use by both Operations and Engineering teams. We are seeking a highly skilled Staff Software Engineer with deep technical expertise in designing and implementing agentic AI solutions. This role requires strong AI engineering fundamentals, systems thinking, and operational excellence. The person in this role will be hands‑on with software development—enhancing agent capabilities, mentoring engineers, and driving technical leadership. What You’ll Do Build production RAG implementations, embedding models, and LLM services for context-aware query answering Architect vector database solutions (Milvus) with optimized indexing strategies for semantic search and hybrid retrieval patterns Build Model Context Protocol (MCP) servers integrating enterprise knowledge platforms (Confluence, Jira) with hybrid search capabilities System Design and Architecture Apply deep technical expertise to architect, design, build, and integrate AI-powered features with a strong focus on Generative AI solutions. Deploy containerized microservices on Kubernetes clusters with service mesh patterns, health probes, and observability instrumentation Design agent communication systems implementing Agent-to-Agent (A2A) Protocol Collaborate and Innovate Work in a dynamic, cross-functional environment, sharing ideas and partnering with engineers and architects to design and deliver agentic AI solutions tailored for Problem Management workflows. Lead code quality initiatives through architectural reviews, design documentation, and technical mentorship of team members Commitment to staying current with rapid advances in LLM capabilities, RAG techniques, and agent architectures What You’ll Bring Atleast 6 + years of professional experience building scalable distributed systems and data-intensive applications Strong experience developing Generative AI applications using Retrieval-Augmented Generation (RAG), Model Context Protocol (MCP), vector databases, and agent orchestration frameworks Expert-level in Python or similar programming languages, including building scalable backend services and data pipelines. Production experience with large language model integration, prompt engineering, embedding generation, and RAG pattern implementations Track record implementing agent protocols, RPC systems, or event-driven service communication with streaming support Strong Kubernetes and container expertise including orchestration, networking, and cloud-native design principles Understanding of observability principles including structured logging, metrics collection, and distributed tracing Experience with agent orchestration libraries (LangChain/LlamaIndex/similar) or workflow engines Solid grounding in OAuth/JWT authentication, token management, and enterprise security patterns Strong understanding of RESTful API design and service‑to‑service communication patterns. Solid grounding in CI/CD practices. Experience using the Model Context Protocol (MCP) for agent orchestration and workflow integration. Strong stakeholder management, communication, and cross-functional collaboration skills. Knowledge of incident response processes, operational workflows, or IT service management The above information has been designed to indicate the general nature and level of work performed in the role. It is not designed to contain or be interpreted as a comprehensive inventory of all duties, responsibilities and qualifications required of employees assigned to this job. The full Job Description can be made available as part of the hiring process.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
11-50 employees