You are right for the job if you are comfortable with System design, Architecture, deep technical Linux, networking topics, and distributed architectures. You will work cross-functionally amongst a variety of teams and be a core contributor in every significant engineering service or solution that we deliver to our stakeholders. You will excel if you have enthusiasm for digging deep, and a flare for sharp technical communication, prioritization, and organization. You will work directly with our Software Engineering teams to build our next generation “always up” cloud-based e-commerce/Retail and Enterprise platform. About Team: Building the right technology foundation for Infrastructure & platforms is vital to success at the scale of Walmart. Our team builds and maintains the foundational technologies that support the tech organization. Included in this are data platforms, enterprise architecture, DevOps, cloud computing, and infrastructure. All of these products and services are supported by scalable and powerful infrastructure, ensuring a secure and seamless employee and customer experience across stores, digital channels, and distribution centers. What you'll do... On Call responsibilities to help minimize MTTD and MTTR of SRE product Experience with containerization and container platforms. (e.g., Docker, Kubernetes, Docker EE, OpenShift, Mesosphere) Should have skills to understand debugging info , “Drain” traffic away from a cluster, Rollback a bad software push , block or rate limiting unwanted traffic, bring up additional serving capacity thru autoscaling features and use the monitoring systems(for alerting and dashboards) Engage with enterprise and business/infrastructure functions to establish, track, and optimize operational metrics and targets in line with SRE principles (SLO/SLI, Latency percentiles , error budgets, tech debt and setup alert guidelines ) Programming/Tooling and Automation experience in one or more of the following languages: Golang, Java, Python, Typescript, Node and Shell . Good understanding of Kafka internals , SQL/noSQL databases like Cassandra , Elasticsearch and Postgress and In-Memory Caching frameworks like Memcached . Influence, design and create new architectures, standards, and methods for large-scale enterprise systems. Design, write and build tools to improve the reliability, latency, availability and scalability of Walmart e-commerce/Retail and Enterprise products. Engender reliability and availability starting with metrics and measurements. Enable scaling by providing tools, developing training and/or augmenting processes. Build tools/automate to prevent re-occurrence of problem to mission critical products/services. Augment existing instrumentation to build a cohesive picture of the characteristics of our systems with special attention to points of failure. Participate in capacity planning, demand forecasting, software performance analysis and system tuning. Develop a deep understanding of the numerous services and applications that come together to deliver Walmart e-commerce/Retail and Enterprise products Working knowledge on any of the Observability tools and enterprise monitoring solutions like Dynatrace, AppDynamics, New Relic, Prometheus etc. Root-cause analysis complex problems involving multiple parties, networks, hardware, and software that relate to scaling and performance. Secure the system from issues, be they real, perceived, or notional.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Mid Level
Number of Employees
5,001-10,000 employees