Infrastructure Solution Engineer

AllstateUSA - TX (Remote), TX
$75,100 - $126,325Hybrid

About The Position

The Infrastructure Solutions Engineer for NOC Operations is an experienced technical contributor in our Standard Incident service team, providing skilled monitoring, troubleshooting, and resolution of infrastructure incidents across our enterprise technology ecosystem. As an integral contributor to our “Zero Wait” customer obsession initiative, this role delivers rapid and effective response to system alerts, ensuring reliable performance of Allstate’s critical infrastructure. With solid proficiency in UNIX, Backup and Storage environments, this position also demonstrates versatility across Windows, Nutanix, Azure, and AWS platforms. Working within our product-centric operating model, the Infrastructure Solutions Engineer applies technical depth and operational discipline to collaborate with Digital Product Teams (DPTs) and other Service Teams to improve service delivery, reduce friction points, and contribute to automation solutions that enhance system reliability and customer experience.

Requirements

  • 5+ years of experience in technical operations, IT support, or system administration
  • Solid working knowledge of enterprise systems Linux (RedHat), Backup (Rubrik), Cloud, and Windows, Nutanix.
  • Strong understanding of incident management processes, event management frameworks, and service delivery fundamentals
  • Strong troubleshooting and analytical skills with the ability to work through complex and unfamiliar technical issues
  • Effective communication abilities, particularly during high-pressure situations and when coordinating across teams
  • Experience with ServiceNow reporting or similar ITSM/event management platforms
  • Working understanding of automation concepts and tools (GitHub, Ansible, Jenkins)
  • Scripting skills (Bash, Python, PowerShell) with the ability to contribute to automation solutions
  • Proficiency with monitoring tools (Datadog, Azure Data Explorer (ADX)) including dashboard usage and alert interpretation

Responsibilities

  • Act as a technical escalation point for incidents across Linux (Red Hat), backup (Rubrik), and storage platforms, supporting and guiding junior analysts.
  • Provide reliable incident response support during critical outages, coordinating with team members and escalating to senior analysts or engineering when appropriate
  • Mentor and support Service Analyst I team members, offering technical guidance and promoting knowledge development
  • Build and maintain solid technical expertise across multiple infrastructure domains
  • Contribute to team knowledge sharing through documentation, SOPs, and participation in training sessions
  • Proactively monitor and respond to alerts across multiple technology stacks, maintaining strong Mean Time to Acknowledge (MTTA) and Mean Time to Resolve (MTTR) metrics
  • Apply “Zero Wait” principles in incident response, taking immediate action on Emergency Command Center (ECC) calls without waiting to be prompted
  • Follow and improve Standard Operating Procedures (SOPs) while identifying opportunities to enhance processes and automate repetitive tasks
  • Support shift turnover meetings to ensure seamless 24/7 operational coverage and effective knowledge transfer
  • Actively contribute to the Service Improvement Backlog (SIB) with ideas that meaningfully enhance service delivery and reduce customer friction points
  • Provide Level 2 support for enterprise infrastructure systems including Linux (RedHat), and Backup (Rubrik) technologies.
  • Execute incident remediation following documented procedures while exercising sound technical judgment during complex or ambiguous scenarios
  • Participate in post-incident Retrospective reviews and problem management activities, contributing to root cause analysis and prevention of recurring issues
  • Collaborate with engineering teams to implement and test system changes, providing operational perspective and supporting readiness validation
  • Utilize monitoring tools including Netcool, Tivoli, Prism Element / Central, Datadog, and Azure Data Explorer (ADX) to identify, diagnose, and resolve system issues
  • Analyze incident patterns and trends to identify automation opportunities that reduce manual intervention and improve service outcomes
  • Develop and maintain quality knowledge base articles and SOP documentation, ensuring accuracy and usability for the broader NOC team
  • Participate in service reviews and demo sessions with technology partners, representing NOC operations effectively
  • Support the team’s KPI goals through consistent, high-quality service delivery and a continuous improvement mindset
  • Contribute meaningfully to the NOC automation pipeline by identifying, documenting, and advocating for automation of repetitive tasks

Benefits

  • Comprehensive technology setup, including a laptop, monitors, headset, keyboard, and mouse.
  • Monthly connectivity reimbursement for eligible remote employees.
  • Dedicated, private workspace free from distractions, along with appropriate desk and seating (for remote work).
  • Reliable internet with minimum speeds of 50 MB download and 5 MB upload (for remote work).
© 2026 Teal Labs, Inc
Privacy PolicyTerms of Service