The Manager, Site Reliability Operations leads the monitoring, incident management, and performance optimization of critical applications and infrastructure. This role ensures compliance with service level objectives by overseeing alerting systems, triaging incidents, and driving root cause analyses to enhance operational stability. Collaborating with cross-functional teams and stakeholders, the manager implements best practices in automation, change management, and continuous improvement. This position fosters a culture of accountability and resilience while supporting strategic initiatives that advance system reliability and business outcomes across varied technology environments. The Site Reliability Operations team at Walmart ensures the stability and performance of critical systems supporting retail operations. This team collaborates across functions to monitor application health, manage incidents, and implement automation for operational efficiency. Members apply expertise in incident management, DevOps, and stakeholder engagement to maintain service reliability and drive continuous improvement. Focused on proactive monitoring and rapid response, the team supports Walmart’s commitment to delivering seamless experiences for customers and associates while aligning with strategic business objectives through effective operational performance management.
Stand Out From the Crowd
Upload your resume and get instant feedback on how well it matches this job.
Job Type
Full-time
Career Level
Manager