Service Analyst II

AllstateMcCullom Lake, IL
2d

About The Position

The Service Analyst II for NOC Operations is an experienced technical professional in our Standard Incident service team, leading the monitoring, troubleshooting, and resolution of infrastructure incidents across our enterprise technology ecosystem. As a strong contributor to our “Zero Wait” customer obsession initiative, this role drives rapid and effective response to system alerts, ensuring the reliability and performance of Allstate’s critical infrastructure. With demonstrated proficiency in UNIX, Storage, and Backup environments, this position also demonstrates broad versatility across Windows, Nutanix, Azure, and AWS platforms. Working within our product-centric operating model, the Service Analyst II applies deep technical knowledge and operational expertise to collaborate with Digital Product Teams (DPTs) and other Service Teams to optimize service delivery, eliminate friction points, and design automation solutions that enhance system reliability and customer experience.

Requirements

  • Advanced working knowledge of enterprise systems (UNIX/Linux (RedHat), Storage SAN, NAS and OBS (Brocade, Hitachi, Pure, Scality, Cisco MDS), Backup (Rubrik), Windows, Nutanix) with demonstrated ability to troubleshoot complex issues
  • Strong understanding of incident management processes, incident / event management frameworks, and service delivery optimization
  • Advanced troubleshooting and analytical skills with demonstrated ability to resolve complex and novel technical challenges
  • Strong communication abilities, particularly during high-pressure situations and when explaining technical concepts to non-technical stakeholders
  • Strong understanding of automation concepts and tools (GitHub, Ansible, Jenkins, CI/CD pipelines)
  • Advanced scripting skills (Bash, Python, PowerShell) with the ability to develop and contribute to automation solutions
  • Strong skills with monitoring tools (Datadog, Azure Data Explorer (ADX)) including custom dashboard usage and alert tuning
  • Working knowledge of DevOps practices and principles with experience implementing infrastructure as code
  • Leadership experience in a 24/7 operational environment

Nice To Haves

  • 6+ years of experience in technical operations, IT support, or system administration
  • Familiarity with AI-assisted tools for incident triage, root cause analysis, or automated remediation workflows
  • Familiarity with large language model (LLM) tools (e.g., GitHub Copilot, Microsoft Copilot, or similar) to accelerate documentation, scripting, and knowledge base development

Responsibilities

  • Serve as the primary technical escalation point for complex incidents across Storage, UNIX, and Backup technology stacks, providing expert guidance to junior analysts
  • Lead incident response during critical outages, coordinating actions across team members and engineering partners to drive rapid resolution
  • Coach and mentor Service Analyst I team members, providing technical guidance and supporting career development
  • Develop and maintain deep technical expertise across multiple infrastructure domains
  • Drive knowledge sharing across the NOC team through documentation, SOPs, and active participation in training and development sessions
  • Lead by example in proactive monitoring and rapid response to alerts across multiple technology stacks, consistently achieving strong Mean Time to Acknowledge (MTTA) and Mean Time to Resolve (MTTR) metrics
  • Champion “Zero Wait” principles in incident response, taking immediate and decisive action on Emergency Command Center (ECC) calls without waiting to be prompted
  • Critically evaluate and improve Standard Operating Procedures (SOPs) while identifying strategic opportunities to enhance processes and automate complex tasks
  • Lead shift turnover meetings to ensure seamless 24/7 operational coverage and comprehensive knowledge transfer
  • Actively contribute to and help prioritize the Service Improvement Backlog (SIB) with strategic ideas that substantially enhance service delivery and eliminate customer friction points
  • Provide advanced Level 2 support for enterprise infrastructure systems including Storage SAN, NAS and OBS (Brocade, Hitachi, Pure, Scality, Cisco MDS), Linux (RedHat), and Backup (Rubrik) technologies, with minimal escalation required for common and moderately complex issues
  • Design and execute incident remediation strategies for both documented and novel problems, applying advanced technical judgment in complex or ambiguous scenarios
  • Lead post-incident Retrospective reviews and problem management activities, driving root cause analysis and implementing preventive measures against recurring issues
  • Partner closely with engineering teams to implement and validate system changes, providing strong operational perspective and ensuring readiness for production
  • Apply advanced techniques with monitoring tools including Netcool, Tivoli, Prism Element / Central, Datadog, and Azure Data Explorer (ADX) to proactively identify, diagnose, and prevent system issues
  • Analyze complex incident patterns and trends to architect automation solutions that reduce manual intervention and improve service outcomes
  • Develop comprehensive knowledge base strategies and ensure knowledge artifacts meet high quality standards for accuracy and usability across the NOC team
  • Present at service reviews and demo sessions with technology partners, showcasing NOC operational improvements and contributions
  • Drive the team’s KPI improvements through technical expertise, process optimization, and a continuous improvement mindset
  • Lead significant contributions to the NOC automation pipeline by identifying, designing, and advocating for automation solutions that improve service delivery
© 2024 Teal Labs, Inc
Privacy PolicyTerms of Service