(USA) Principal, Software Engineer

WalmartBentonville, AR
$110,000 - $286,000

About The Position

Join Walmart as a Principal Software Engineer for the Colony Platform within our AI & Data organization and help make it trivially easy for associates — engineers, data scientists, and builders— to go from idea to AI-based solutions quickly, safely, and cost effectively. At Walmart, we operate at Fortune #1 scale. Our challenges are complex, global, and deeply meaningful. Here, you’ll do the best work of your life — work that helps people save money and live better. About AI & Data The AI & Data organization in Walmart Global Tech is building a platform-first ecosystem that centralizes enterprise data, provides AI foundations, and delivers intuitive AI solutions for all personas. We are enabling AI at enterprise scale — responsibly, securely, and reliably. Position Summary As a Principal Software Engineer, you will serve as a senior technical authority and hands-on architect across critical AI & Data platforms. You will shape system architecture, influence engineering standards, and drive platform strategy that supports AI-powered experiences across Walmart globally. Colony is an agentic AI framework designed to orchestrate complex, AI-driven workflows in a multi-tenant environment. It provides a robust, API-driven process execution layer built on the Camunda Zeebe engine. You design agent behavior as visual workflows that are executed by a distributed process engine. This includes performance, reliability, safety of execution, and ship a great end-user experience. This role requires deep expertise in distributed systems, platform engineering, and AI-enabled architectures — combined with the ability to influence across teams and elevate technical rigor enterprise-wide. You will: Own high-impact architectural decisions Drive scalable, resilient system design Prototype and productionize advanced AI-enabled capabilities Mentor senior engineers and act as a force multiplier Balance long-term platform sustainability with near-term business outcomes What You’ll Do Architecture & Technical Strategy Define and evolve reference architectures for distributed systems, AI pipelines, and platform services. Drive system design reviews ensuring scalability, reliability, observability, and cost efficiency. Design systems that integrate cleanly with enterprise data and AI foundations. Make thoughtful design trade-offs balancing long-term platform integrity with short-term delivery needs. Improve the reliability and quality of the end-to-end system across local client + gateway + external APIs (debugging, telemetry, performance tuning). Implement security and compliance guardrails for local execution (least privilege, secrets handling, auditing, allowlists/deny lists where appropriate). Build comprehensive testing: unit, integration, contract tests for tool schemas, and end-to-end tests for common workflows. Drive engineering excellence: code reviews, design docs, mentoring, incident follow-ups, and raising operational standards. AI-Enabled Platform Engineering Lead development of AI-powered services, agent workflows, and internal builder platforms. Prototype and productionize GenAI-enabled capabilities in secure, governed environments. Champion responsible AI patterns, including guardrails and human-in-the-loop design. Design, build, and operate core agent orchestration components (UI → agent core logic → tool manager → local tools). Develop and maintain local tool plugins (Python-based) that perform actions on behalf of the user (file read/list/edit, command execution, integrations). Build robust tool-call validation and execution (schema enforcement, parameter validation, retries, error handling, idempotency, and safe defaults). Integrate with enterprise APIs via HTTPS (e.g., Microsoft Graph) for workflows like user lookup, email/calendar actions, and related productivity scenarios. Hands-On Technical Leadership Contribute code and prototypes for complex, high-risk, or ambiguous initiatives. Raise engineering standards driving engineering excellence through rigorous code reviews, operational reviews, and architectural discussions. Implement security and compliance guardrails for local execution (least privilege, secrets handling, auditing, allowlists/deny lists where appropriate). Improve CI/CD, reliability engineering, and platform observability practices. Establish performance, reliability, and cost benchmarks, E2E across local client + gateway + external APIs (debugging, telemetry, performance tuning). Cross-Functional Influence Partner with product, governance, enterprise data, and infrastructure teams. Translate complex technical concepts into business-impact narratives. Drive consensus across senior engineers and engineering leaders. Influence multi-team roadmaps and reduce architectural fragmentation. Mentorship & Technical Multiplication Mentor senior engineers and emerging technical leaders. Elevate architectural maturity across teams. Create reusable frameworks, patterns, and internal documentation that scale impact beyond your direct team.

Requirements

  • 12+ years of experience building highly available, distributed systems.
  • Proven track record delivering complex, enterprise-scale software systems from inception to production.
  • Strong proficiency in Python (building libraries/services/tools), including packaging/dependencies, logging, and performance troubleshooting.
  • Working knowledge of OAuth2/OIDC authentication and scope/permission models.
  • Familiarity with schema/contract frameworks (JSON Schema, OpenAPI, Pydantic, protobuf) and backward-compatible tool evolution.
  • Experience with observability: structured logging, metrics, traces, and debugging distributed flows across client + gateway.
  • Experience working with AI/ML ecosystems in production environments.
  • Strong architectural judgment and ability to evaluate trade-offs.
  • Exceptional communication, influence, and consensus-building skills.
  • Demonstrated ability to mentor and grow technical talent.
  • Intellectual curiosity and ability to rapidly learn new domains and technologies.
  • Bachelor’s degree in Computer Science, Computer Engineering, Computer Information Systems, Software Engineering, or related field and 5 years’ experience in software engineering or related area.
  • 7 years’ experience in software engineering or related area.

Nice To Haves

  • Master’s degree in Computer Science or related field.
  • Experience building AI-first platforms or internal developer ecosystems.
  • Experience with LLM tool-calling / agentic systems (structured tool invocation, schema validation, prompt/tool definition design, guardrails).
  • Experience integrating data/query execution systems (e.g., BigQuery-like workflows) with governance/cost controls.
  • Strong command of CLI + Git workflows and building developer productivity tooling (SDKs, CLIs, templates, diagnostics).
  • Experience embedding responsible AI and governance practices into engineering workflows.
  • Knowledge of accessibility best practices, including WCAG 2.2 AA standards.

Responsibilities

  • Own high-impact architectural decisions
  • Drive scalable, resilient system design
  • Prototype and productionize advanced AI-enabled capabilities
  • Mentor senior engineers and act as a force multiplier
  • Balance long-term platform sustainability with near-term business outcomes
  • Define and evolve reference architectures for distributed systems, AI pipelines, and platform services.
  • Drive system design reviews ensuring scalability, reliability, observability, and cost efficiency.
  • Design systems that integrate cleanly with enterprise data and AI foundations.
  • Make thoughtful design trade-offs balancing long-term platform integrity with short-term delivery needs.
  • Improve the reliability and quality of the end-to-end system across local client + gateway + external APIs (debugging, telemetry, performance tuning).
  • Implement security and compliance guardrails for local execution (least privilege, secrets handling, auditing, allowlists/deny lists where appropriate).
  • Build comprehensive testing: unit, integration, contract tests for tool schemas, and end-to-end tests for common workflows.
  • Drive engineering excellence: code reviews, design docs, mentoring, incident follow-ups, and raising operational standards.
  • Lead development of AI-powered services, agent workflows, and internal builder platforms.
  • Prototype and productionize GenAI-enabled capabilities in secure, governed environments.
  • Champion responsible AI patterns, including guardrails and human-in-the-loop design.
  • Design, build, and operate core agent orchestration components (UI → agent core logic → tool manager → local tools).
  • Develop and maintain local tool plugins (Python-based) that perform actions on behalf of the user (file read/list/edit, command execution, integrations).
  • Build robust tool-call validation and execution (schema enforcement, parameter validation, retries, error handling, idempotency, and safe defaults).
  • Integrate with enterprise APIs via HTTPS (e.g., Microsoft Graph) for workflows like user lookup, email/calendar actions, and related productivity scenarios.
  • Contribute code and prototypes for complex, high-risk, or ambiguous initiatives.
  • Raise engineering standards driving engineering excellence through rigorous code reviews, operational reviews, and architectural discussions.
  • Implement security and compliance guardrails for local execution (least privilege, secrets handling, auditing, allowlists/deny lists where appropriate).
  • Improve CI/CD, reliability engineering, and platform observability practices.
  • Establish performance, reliability, and cost benchmarks, E2E across local client + gateway + external APIs (debugging, telemetry, performance tuning).
  • Partner with product, governance, enterprise data, and infrastructure teams.
  • Translate complex technical concepts into business-impact narratives.
  • Drive consensus across senior engineers and engineering leaders.
  • Influence multi-team roadmaps and reduce architectural fragmentation.
  • Mentor senior engineers and emerging technical leaders.
  • Elevate architectural maturity across teams.
  • Create reusable frameworks, patterns, and internal documentation that scale impact beyond your direct team.

Benefits

  • Beyond our great compensation package, you can receive incentive awards for your performance.
  • Other great perks include 401(k) match, stock purchase plan, paid maternity and parental leave, PTO, multiple health plans, and much more.
  • Health benefits include medical, vision and dental coverage.
  • Financial benefits include 401(k), stock purchase and company-paid life insurance.
  • Paid time off benefits include PTO (including sick leave), parental leave, family care leave, bereavement, jury duty, and voting.
  • Other benefits include short-term and long-term disability, company discounts, Military Leave Pay, adoption and surrogacy expense reimbursement, and more.
  • You will also receive PTO and/or PPTO that can be used for vacation, sick leave, holidays, or other purposes.
  • Live Better U is a Walmart-paid education benefit program for full-time and part-time associates in Walmart and Sam's Club facilities.
  • Programs range from high school completion to bachelor's degrees, including English Language Learning and short-form certificates.
  • Tuition, books, and fees are completely paid for by Walmart.
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service