Join Walmart as a Principal Software Engineer for the Colony Platform within our AI & Data organization and help make it trivially easy for associates — engineers, data scientists, and builders— to go from idea to AI-based solutions quickly, safely, and cost effectively. At Walmart, we operate at Fortune #1 scale. Our challenges are complex, global, and deeply meaningful. Here, you’ll do the best work of your life — work that helps people save money and live better. About AI & Data The AI & Data organization in Walmart Global Tech is building a platform-first ecosystem that centralizes enterprise data, provides AI foundations, and delivers intuitive AI solutions for all personas. We are enabling AI at enterprise scale — responsibly, securely, and reliably. Position Summary As a Principal Software Engineer, you will serve as a senior technical authority and hands-on architect across critical AI & Data platforms. You will shape system architecture, influence engineering standards, and drive platform strategy that supports AI-powered experiences across Walmart globally. Colony is an agentic AI framework designed to orchestrate complex, AI-driven workflows in a multi-tenant environment. It provides a robust, API-driven process execution layer built on the Camunda Zeebe engine. You design agent behavior as visual workflows that are executed by a distributed process engine. This includes performance, reliability, safety of execution, and ship a great end-user experience. This role requires deep expertise in distributed systems, platform engineering, and AI-enabled architectures — combined with the ability to influence across teams and elevate technical rigor enterprise-wide. You will: Own high-impact architectural decisions Drive scalable, resilient system design Prototype and productionize advanced AI-enabled capabilities Mentor senior engineers and act as a force multiplier Balance long-term platform sustainability with near-term business outcomes What You’ll Do Architecture & Technical Strategy Define and evolve reference architectures for distributed systems, AI pipelines, and platform services. Drive system design reviews ensuring scalability, reliability, observability, and cost efficiency. Design systems that integrate cleanly with enterprise data and AI foundations. Make thoughtful design trade-offs balancing long-term platform integrity with short-term delivery needs. Improve the reliability and quality of the end-to-end system across local client + gateway + external APIs (debugging, telemetry, performance tuning). Implement security and compliance guardrails for local execution (least privilege, secrets handling, auditing, allowlists/deny lists where appropriate). Build comprehensive testing: unit, integration, contract tests for tool schemas, and end-to-end tests for common workflows. Drive engineering excellence: code reviews, design docs, mentoring, incident follow-ups, and raising operational standards. AI-Enabled Platform Engineering Lead development of AI-powered services, agent workflows, and internal builder platforms. Prototype and productionize GenAI-enabled capabilities in secure, governed environments. Champion responsible AI patterns, including guardrails and human-in-the-loop design. Design, build, and operate core agent orchestration components (UI → agent core logic → tool manager → local tools). Develop and maintain local tool plugins (Python-based) that perform actions on behalf of the user (file read/list/edit, command execution, integrations). Build robust tool-call validation and execution (schema enforcement, parameter validation, retries, error handling, idempotency, and safe defaults). Integrate with enterprise APIs via HTTPS (e.g., Microsoft Graph) for workflows like user lookup, email/calendar actions, and related productivity scenarios. Hands-On Technical Leadership Contribute code and prototypes for complex, high-risk, or ambiguous initiatives. Raise engineering standards driving engineering excellence through rigorous code reviews, operational reviews, and architectural discussions. Implement security and compliance guardrails for local execution (least privilege, secrets handling, auditing, allowlists/deny lists where appropriate). Improve CI/CD, reliability engineering, and platform observability practices. Establish performance, reliability, and cost benchmarks, E2E across local client + gateway + external APIs (debugging, telemetry, performance tuning). Cross-Functional Influence Partner with product, governance, enterprise data, and infrastructure teams. Translate complex technical concepts into business-impact narratives. Drive consensus across senior engineers and engineering leaders. Influence multi-team roadmaps and reduce architectural fragmentation. Mentorship & Technical Multiplication Mentor senior engineers and emerging technical leaders. Elevate architectural maturity across teams. Create reusable frameworks, patterns, and internal documentation that scale impact beyond your direct team.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Senior
Number of Employees
1-10 employees