Service Analyst II

Allstate•McCullom Lake, IL

About The Position

The Service Analyst II for NOC Operations is an experienced technical professional in our Standard Incident service team, leading the monitoring, troubleshooting, and resolution of infrastructure incidents across our enterprise technology ecosystem. As a strong contributor to our “Zero Wait” customer obsession initiative, this role drives rapid and effective response to system alerts, ensuring the reliability and performance of Allstate’s critical infrastructure. With demonstrated proficiency in UNIX, Storage, and Backup environments, this position also demonstrates broad versatility across Windows, Nutanix, Azure, and AWS platforms. Working within our product-centric operating model, the Service Analyst II applies deep technical knowledge and operational expertise to collaborate with Digital Product Teams (DPTs) and other Service Teams to optimize service delivery, eliminate friction points, and design automation solutions that enhance system reliability and customer experience.

Requirements

Advanced working knowledge of enterprise systems (UNIX/Linux (RedHat), Storage SAN, NAS and OBS (Brocade, Hitachi, Pure, Scality, Cisco MDS), Backup (Rubrik), Windows, Nutanix) with demonstrated ability to troubleshoot complex issues
Strong understanding of incident management processes, incident / event management frameworks, and service delivery optimization
Advanced troubleshooting and analytical skills with demonstrated ability to resolve complex and novel technical challenges
Strong communication abilities, particularly during high-pressure situations and when explaining technical concepts to non-technical stakeholders
Strong understanding of automation concepts and tools (GitHub, Ansible, Jenkins, CI/CD pipelines)
Advanced scripting skills (Bash, Python, PowerShell) with the ability to develop and contribute to automation solutions
Strong skills with monitoring tools (Datadog, Azure Data Explorer (ADX)) including custom dashboard usage and alert tuning
Working knowledge of DevOps practices and principles with experience implementing infrastructure as code
Leadership experience in a 24/7 operational environment

Nice To Haves

6+ years of experience in technical operations, IT support, or system administration
Familiarity with AI-assisted tools for incident triage, root cause analysis, or automated remediation workflows
Familiarity with large language model (LLM) tools (e.g., GitHub Copilot, Microsoft Copilot, or similar) to accelerate documentation, scripting, and knowledge base development

Responsibilities

Serve as the primary technical escalation point for complex incidents across Storage, UNIX, and Backup technology stacks, providing expert guidance to junior analysts
Lead incident response during critical outages, coordinating actions across team members and engineering partners to drive rapid resolution
Coach and mentor Service Analyst I team members, providing technical guidance and supporting career development
Develop and maintain deep technical expertise across multiple infrastructure domains
Drive knowledge sharing across the NOC team through documentation, SOPs, and active participation in training and development sessions
Lead by example in proactive monitoring and rapid response to alerts across multiple technology stacks, consistently achieving strong Mean Time to Acknowledge (MTTA) and Mean Time to Resolve (MTTR) metrics
Champion “Zero Wait” principles in incident response, taking immediate and decisive action on Emergency Command Center (ECC) calls without waiting to be prompted
Critically evaluate and improve Standard Operating Procedures (SOPs) while identifying strategic opportunities to enhance processes and automate complex tasks
Lead shift turnover meetings to ensure seamless 24/7 operational coverage and comprehensive knowledge transfer
Actively contribute to and help prioritize the Service Improvement Backlog (SIB) with strategic ideas that substantially enhance service delivery and eliminate customer friction points
Provide advanced Level 2 support for enterprise infrastructure systems including Storage SAN, NAS and OBS (Brocade, Hitachi, Pure, Scality, Cisco MDS), Linux (RedHat), and Backup (Rubrik) technologies, with minimal escalation required for common and moderately complex issues
Design and execute incident remediation strategies for both documented and novel problems, applying advanced technical judgment in complex or ambiguous scenarios
Lead post-incident Retrospective reviews and problem management activities, driving root cause analysis and implementing preventive measures against recurring issues
Partner closely with engineering teams to implement and validate system changes, providing strong operational perspective and ensuring readiness for production
Apply advanced techniques with monitoring tools including Netcool, Tivoli, Prism Element / Central, Datadog, and Azure Data Explorer (ADX) to proactively identify, diagnose, and prevent system issues
Analyze complex incident patterns and trends to architect automation solutions that reduce manual intervention and improve service outcomes
Develop comprehensive knowledge base strategies and ensure knowledge artifacts meet high quality standards for accuracy and usability across the NOC team
Present at service reviews and demo sessions with technology partners, showcasing NOC operational improvements and contributions
Drive the team’s KPI improvements through technical expertise, process optimization, and a continuous improvement mindset
Lead significant contributions to the NOC automation pipeline by identifying, designing, and advocating for automation solutions that improve service delivery